在 Anomalib 中用 PatchCore 训练自定义数据集

2301_77169879

375人浏览 · 2026-05-21 14:38:21

2301_77169879 · 2026-05-21 14:38:21 发布

一、前言：Anomalib 和 PatchCore 是什么？

在工业视觉检测中，经常会遇到这样一种问题：正常样本很多，异常样本很少，甚至异常类型不固定。如果直接用 YOLO、分类或者分割模型，需要大量缺陷样本和人工标注；但在真实产线里（比如自攻螺丝尖头缺口），缺陷样本本来就少，而且很多异常形态无法提前枚举。这类场景更适合使用异常检测 Anomaly Detection。

Anomalib 是一个面向视觉异常检测的深度学习库，官方定位是收集并实现多种先进异常检测算法，支持在公开数据集和私有数据集上进行训练、推理、评估和部署。支持 PatchCore、PaDiM、FastFlow、STFPM、EfficientAD 等多种算法。官方 README 也说明 Anomalib 重点关注图像和视频中的异常检测与异常定位，并提供训练、推理、benchmark、超参数优化等模块。(链接：GitHub)

PatchCore 是其中非常经典的一类方法。它不是像 YOLO 那样训练一个检测框，也不是普通分类网络，而是使用预训练 CNN 提取正常图像的局部 patch 特征，把这些正常特征存进 memory bank；测试时，再把测试图像的 patch 特征和正常 memory bank 做最近邻距离比较，距离越大，越可能异常。(文档链接：anomalib.readthedocs.io)

PatchCore 比较适合以下场景：

正常样本较多，异常样本较少
缺陷类型不固定，不方便提前分类
工业表面缺陷、结构缺损、异物、缺口、划伤、破损等任务
要判断整张图是否异常，也想得到异常热力图

当任务满足“正常易收集、缺陷多样且难穷举、要求高精度定位”时，可首选 patchcore。

二、创建虚拟环境并下载 Anomalib 源码

在 anaconda 的基础上，也可以在 vscode 终端虚拟环境中操作。

建议使用 Python 3.10。很多 Anomalib 示例和第三方教程都使用 Python 3.10，兼容性相对稳定。

conda create -n anomalib python=3.10 -y
conda activate anomalib

Anomalib 可以直接通过 pip 安装，也可以从 GitHub 源码安装。

Anomalib 源码：https://github.com/open-edge-platform/anomalib https://github.com/open-edge-platform/anomalib

可以直接下载 zip，也可以通过命令行下载：

git clone https://github.com/open-edge-platform/anomalib.git

三、配置 Anomalib 虚拟环境

1、配置 anomalib 环境

进入 Anomalib 源码目录。假设源码下载到了：

D:\anomalib
cd D:\anomalib
pip install -e

其中 -e 表示 editable mode，也就是开发模式安装。源码修改后不需要重新安装，Python 会直接使用当前源码目录。如果只是学习、调试或者电脑没有 NVIDIA 显卡，建议先用 CPU 版跑通流程。

2、配置 PyTorch 环境

教程移步：在 anaconda 中配置 pytorch 环境（CPU 及 GPU）-CSDN博客

验证自己的 pytorch 是 cpu 还是 gpu，输入：

python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.cuda.is_available())"

如果输出：

torch.version.cuda = None
torch.cuda.is_available() = False

说明当前环境中的 PyTorch 是 CPU 版，Anomalib 即使设置 GPU 也无法使用显卡。

3、验证 anomalib 虚拟环境

执行：

python -c "import anomalib; print(anomalib.__version__)"

如果能正常输出版本号，说明 Anomalib 可以 import。

再检查命令行工具：

anomalib --help

如果能看到 train、predict、benchmark 等命令，说明 CLI 基本可用。

四、自定义数据集

1、推荐数据结构

对于 PatchCore，训练阶段通常只需要正常图。推荐数据结构如下：

D:\patchcore\MVTec\luosi_dataset\ls
├── train
│   └── good
│       ├── 0000.jpg
│       ├── 0001.jpg
│       └── ...
├── test
│   ├── good
│   │   ├── 0000.jpg
│   │   ├── 0001.jpg
│   │   └── ...
│   └── abnormal
│       ├── 0000.jpg
│       ├── 0001.jpg
│       └── ...
└── ground_truth
    └── abnormal
        ├── 0000.png
        ├── 0001.png
        └── ...

其中：

        train/good 用于训练，只放正常样本
        test/good 用于测试，放正常测试样本
        test/abnormal 用于测试，放异常样本
        ground_truth 可选，如果有真实像素级缺陷 mask 才需要

如果没有真实 mask，不建议自己生成全黑 mask 后再相信像素级指标。因为全黑 mask 等于告诉程序“异常图没有任何异常像素”，这样 pixel_AUROC 和 pixel_F1Score 可能没有参考意义。

对于只有图像级异常判断的任务，可以只使用：

        train/good
        test/good
        test/abnormal

2、自动生成 PatchCore 数据集结构

如果你现在原始数据是两个文件夹：

D:\patchcore\MVTec\good    正常图
D:\patchcore\MVTec\ab      异常图

可以用下面的脚本自动整理成 train/good、test/good、test/abnormal 的结构。

import shutil
import random
from pathlib import Path

# =========================
# 1. 配置区域
# =========================

# 原始正常图目录
GOOD_DIR = Path(r"D:\patchcore\MVTec\good")

# 原始异常图目录
ABNORMAL_DIR = Path(r"D:\patchcore\MVTec\ab")

# 输出数据集根目录
OUT_ROOT = Path(r"D:\patchcore\MVTec\luosi_dataset")

# 类别名
CATEGORY_NAME = "ls"

# good 图片中拿多少比例作为训练集
TRAIN_GOOD_RATIO = 0.8

# 是否清空旧数据集
CLEAR_OLD_DATASET = True

# 随机种子，保证每次划分一致
RANDOM_SEED = 42

# 支持图片格式
IMG_EXTS = {".jpg", ".jpeg", ".png", ".bmp", ".tif", ".tiff"}


# =========================
# 2. 工具函数
# =========================

def collect_images(folder: Path):
    """递归收集图片路径。"""
    if not folder.exists():
        raise FileNotFoundError(f"目录不存在：{folder}")

    images = []
    for p in folder.rglob("*"):
        if p.is_file() and p.suffix.lower() in IMG_EXTS:
            images.append(p)

    images = sorted(images)

    if len(images) == 0:
        raise RuntimeError(f"没有在目录中找到图片：{folder}")

    return images


def safe_copy(src: Path, dst: Path):
    """复制图片，自动创建父目录。"""
    dst.parent.mkdir(parents=True, exist_ok=True)
    shutil.copy2(src, dst)


def reset_dir(path: Path):
    """清空并重建目录。"""
    if path.exists():
        shutil.rmtree(path)
    path.mkdir(parents=True, exist_ok=True)


# =========================
# 3. 主流程
# =========================

def main():
    random.seed(RANDOM_SEED)

    category_root = OUT_ROOT / CATEGORY_NAME

    train_good_dir = category_root / "train" / "good"
    test_good_dir = category_root / "test" / "good"
    test_abnormal_dir = category_root / "test" / "abnormal"

    if CLEAR_OLD_DATASET:
        reset_dir(category_root)
    else:
        train_good_dir.mkdir(parents=True, exist_ok=True)
        test_good_dir.mkdir(parents=True, exist_ok=True)
        test_abnormal_dir.mkdir(parents=True, exist_ok=True)

    good_images = collect_images(GOOD_DIR)
    abnormal_images = collect_images(ABNORMAL_DIR)

    random.shuffle(good_images)

    train_num = int(len(good_images) * TRAIN_GOOD_RATIO)
    train_good_images = good_images[:train_num]
    test_good_images = good_images[train_num:]

    if len(train_good_images) == 0:
        raise RuntimeError("训练集 good 数量为 0，请增加 good 图片或提高 TRAIN_GOOD_RATIO。")

    if len(test_good_images) == 0:
        raise RuntimeError("测试集 good 数量为 0，请降低 TRAIN_GOOD_RATIO。")

    for idx, img_path in enumerate(train_good_images):
        dst_name = f"{idx:04d}{img_path.suffix.lower()}"
        safe_copy(img_path, train_good_dir / dst_name)

    for idx, img_path in enumerate(test_good_images):
        dst_name = f"{idx:04d}{img_path.suffix.lower()}"
        safe_copy(img_path, test_good_dir / dst_name)

    for idx, img_path in enumerate(abnormal_images):
        dst_name = f"{idx:04d}{img_path.suffix.lower()}"
        safe_copy(img_path, test_abnormal_dir / dst_name)

    print("PatchCore 数据集生成完成。")
    print(f"输出目录：{category_root}")
    print()
    print("数据统计：")
    print(f"good 总数：{len(good_images)}")
    print(f"abnormal 总数：{len(abnormal_images)}")
    print(f"train/good：{len(train_good_images)}")
    print(f"test/good：{len(test_good_images)}")
    print(f"test/abnormal：{len(abnormal_images)}")


if __name__ == "__main__":
    main()

五、PatchCore 完整训练脚本

下面是一个完整的 train_patchcore.py，使用 Anomalib 的 Python API 训练自定义数据集，并导出 Torch 模型。已经加入自己整理符合要求的数据集格式，只需要输入正常图和异常图的地址即可。只要根目录名称相同，并不会再次生成数据集。

# train_patchcore_auto.py
"""
功能说明：
    本脚本用于使用 Anomalib PatchCore 训练工业异常检测模型，并支持“只有 good”和“good + abnormal”两种数据情况。

核心功能：
    1. 只需要给出 GOOD_DIR 正常图像路径。(如果有 abnormal 图像，就填写 ABNORMAL_DIR,没有设置为 None)
    2. 自动把 good 图像划分为 train/good 和 test/good。（存在 abnormal 图像，脚本会自动复制到 test/abnormal，且自动生成全黑占位 mask。）
    3. 自动训练 PatchCore。
    4. 如果有 abnormal，脚本会执行完整异常检测评估。如果只有 good，脚本会跳过完整评估，但可以执行 normal-only 误报检查。
    5. 导出 Torch 格式模型。

三种运行模式：
    模式 A：只有 good 图像，并且 good 数量太少无法划分 test/good。
        执行：训练 + 导出。（不执行：评估。）

    模式 B：只有 good 图像，但可以划分 train/good 和 test/good。
        执行：训练 + normal-only 推理检查 + 导出。（不执行：完整异常检测评估。）

    模式 C：同时有 good 和 abnormal 图像。
        执行：训练 + 完整异常检测评估 + 导出。（注意：abnormal 只用于测试，不参与训练。）

重要说明：
    PatchCore 训练阶段只需要正常样本。
    abnormal 样本不是训练必需品，而是评估模型区分正常和异常能力时需要。
    只有 test/good 时，只能检查正常样本是否容易误报，不能计算完整 AUROC、F1、Recall 等指标。
    如果 abnormal 没有真实 mask，本脚本生成的全黑 mask 只是占位，不能用于严肃评价像素级定位效果。
    如果你只关心图像级正常/异常判断，abnormal 没有 mask 也可以先跑通流程。要严肃评估像素级定位效果，那就需要真实 mask。

推荐原始输入：
    D:\patchcore\raw\good
        001.jpg
    D:\patchcore\raw\abnormal
        001.jpg

自动生成后的结构：
    E:\patchcore_dataset
        ├── train
        │   └── good
        ├── test
        │   ├── good
        │   └── bad
        └── ground_truth
            └── bad
使用方法：
    1. 修改 GOOD_DIR。
    2. 有异常图就修改 ABNORMAL_DIR。
    3. 没有异常图就设置 ABNORMAL_DIR = None。
    4. 修改 OUT_ROOT、CATEGORY_NAME、EXPORT_ROOT。
    5. 运行 python train_patchcore_auto.py。
"""

import os
import shutil
import random
from pathlib import Path

from PIL import Image
import numpy as np
import torch

from anomalib.data import Folder
from anomalib.deploy import ExportType
from anomalib.engine import Engine
from anomalib.models import Patchcore


# =========================
# 1. 数据路径配置
# =========================

GOOD_DIR = Path(r"D:\patchcore\MVTec\good")                  # 原始正常图片目录。
ABNORMAL_DIR = Path(r"D:\patchcore\MVTec\ab")                # 原始异常图片目录，没有异常就改成 None。

OUT_ROOT = Path(r"D:\anomalib\data")                         # 数据集输出根目录。
CATEGORY_NAME = "ls"                                        # 产品类别名，最终目录是 OUT_ROOT / CATEGORY_NAME。

REUSE_EXISTING_DATASET = True                               # 如果已有 ls 数据集，就直接训练不覆盖。
CLEAR_OLD_DATASET = False                                   # 不建议默认清空，避免误删已有数据集。
TRAIN_GOOD_RATIO = 0.8                                      # 没有现成数据集时才用于划分 good。
RANDOM_SEED = 42                                            # 随机种子。
IMG_EXTS = {".jpg", ".jpeg", ".png", ".bmp", ".tif", ".tiff"}  # 支持图片格式。

def increment_path(base_dir: str | Path, name: str = "exp") -> Path:
    base_dir = Path(base_dir)
    path = base_dir / name

    if not path.exists():
        return path

    i = 2
    while True:
        new_path = base_dir / f"{name}{i}"
        if not new_path.exists():
            return new_path
        i += 1

# =========================
# 2. PatchCore 训练配置
# =========================

IMAGE_SIZE = (128, 128)                      # 输入模型的图像尺寸。
MAX_EPOCHS = 1                               # PatchCore 通常一轮即可。
TRAIN_DEVICE = "cpu"                         # 可选 cpu 或 gpu。
GPU_DEVICES = 0                              # 使用 GPU 数量。
TRAIN_BATCH_SIZE = 4                         # 训练 batch size。
EVAL_BATCH_SIZE = 4                          # 测试 batch size。
NUM_WORKERS = 0                              # Windows 下建议先用 0。

BACKBONE = "wide_resnet50_2"                                # 特征提取骨干网络。
LAYERS = ["layer2", "layer3"]                               # PatchCore 使用的中间层。
PRE_TRAINED = True                                          # 是否使用预训练权重。
CORESET_SAMPLING_RATIO = 0.02                               # 正常特征库采样比例。
NUM_NEIGHBORS = 9                                           # 最近邻数量。

RUN_NORMAL_ONLY_CHECK = True                                # 只有 test/good 时是否做误报检查。
EXPORT_ROOT = Path(r"")    # 模型导出目录。


EXPORT_BASE = r"D:\anomalib\exports\ls_patchcore\weights"    # 模型导出目录。
EXPORT_ROOT = increment_path(EXPORT_BASE, name="exp")

# =========================
# 3. 工具函数
# =========================
def clean_path_str(path: str) -> str:
    bad_chars = ["\u200b", "\u200c", "\u200d", "\ufeff", "\u00a0"]  # 定义常见隐藏字符。
    for char in bad_chars:
        path = path.replace(char, "")  # 删除隐藏字符。
    return path.strip()  # 去除两端空白。


def normalize_optional_path(path_value):
    if path_value is None:
        return None  # None 表示用户没有提供路径。
    path_str = clean_path_str(str(path_value))  # 转为字符串并清理。
    if path_str == "":
        return None  # 空字符串也视为没有路径。
    return Path(path_str)  # 返回 Path 对象。


def collect_images(folder: Path, allow_empty: bool = False) -> list[Path]:
    folder = normalize_optional_path(folder)  # 统一处理路径对象。

    if folder is None:
        return []  # 没有路径时返回空列表。

    if not folder.exists():
        if allow_empty:
            return []  # 允许为空时不报错。
        raise FileNotFoundError(f"目录不存在：{folder}")  # 必需目录不存在就报错。

    images = []  # 创建图片列表。
    for path in folder.rglob("*"):
        if path.is_file() and path.suffix.lower() in IMG_EXTS:
            images.append(path)  # 收集合法图片。

    images = sorted(images)  # 排序保证稳定性。

    if len(images) == 0 and not allow_empty:
        raise RuntimeError(f"目录中没有找到图片：{folder}")  # 必需图片为空就报错。

    return images  # 返回图片路径列表。


def reset_dir(path: Path) -> None:
    if path.exists():
        shutil.rmtree(path)  # 删除旧目录。
    path.mkdir(parents=True, exist_ok=True)  # 创建新目录。


def safe_copy(src: Path, dst: Path) -> None:
    dst.parent.mkdir(parents=True, exist_ok=True)  # 创建目标父目录。
    shutil.copy2(src, dst)  # 复制图片并保留元信息。


def make_black_mask_like_image(img_path: Path, mask_path: Path) -> None:
    mask_path.parent.mkdir(parents=True, exist_ok=True)  # 创建 mask 目录。

    with Image.open(img_path) as img:
        width, height = img.size  # 读取原图宽高。

    mask = np.zeros((height, width), dtype=np.uint8)  # 创建全黑单通道 mask。
    Image.fromarray(mask).save(mask_path)  # 保存 mask 图片。


def directory_has_files(path: Path) -> bool:
    if not path.exists():
        return False  # 目录不存在则没有文件。
    return any(p.is_file() for p in path.rglob("*"))  # 判断目录下是否存在文件。


# =========================
# 4. 自动生成 Folder 数据集
# =========================

def count_images_in_dir(folder: Path) -> int:
    if not folder.exists():
        return 0  # 目录不存在则图片数为 0。
    return sum(1 for p in folder.rglob("*") if p.is_file() and p.suffix.lower() in IMG_EXTS)  # 统计图片数量。

def inspect_existing_dataset(dataset_root: Path) -> tuple[bool, bool, bool]:
    train_good_dir = dataset_root / "train" / "good"  # 已整理数据集的训练正常目录。
    test_good_dir = dataset_root / "test" / "good"  # 已整理数据集的测试正常目录。
    test_abnormal_dir = dataset_root / "test" / "abnormal"  # 已整理数据集的测试异常目录。

    has_train_good = count_images_in_dir(train_good_dir) > 0  # 判断是否有训练 good。
    has_test_good = count_images_in_dir(test_good_dir) > 0  # 判断是否有测试 good。
    use_abnormal = count_images_in_dir(test_abnormal_dir) > 0  # 判断是否有测试 abnormal。

    return has_train_good, has_test_good, use_abnormal  # 返回已有数据集状态。

def build_auto_dataset() -> tuple[Path, bool, bool]:
    random.seed(RANDOM_SEED)  # 固定随机划分。

    dataset_root = OUT_ROOT / CATEGORY_NAME  # 最终训练数据集目录。

    if REUSE_EXISTING_DATASET and dataset_root.exists():
        has_train_good, has_test_good, use_abnormal = inspect_existing_dataset(dataset_root)  # 检查已有数据集。

        if has_train_good:
            print("=" * 80)
            print("检测到已有数据集，跳过整理步骤，直接使用该数据集训练。")
            print(f"数据集目录：{dataset_root}")
            print(f"train/good 图片数：{count_images_in_dir(dataset_root / 'train' / 'good')}")
            print(f"test/good 图片数：{count_images_in_dir(dataset_root / 'test' / 'good')}")
            print(f"test/abnormal 图片数：{count_images_in_dir(dataset_root / 'test' / 'abnormal')}")
            print(f"是否存在 test/good：{has_test_good}")
            print(f"是否存在 abnormal：{use_abnormal}")
            print("=" * 80)

            return dataset_root, has_test_good, use_abnormal  # 直接返回已有数据集。

        print(f"检测到目录存在但没有 train/good 图片，将重新整理：{dataset_root}")  # 空目录或结构不完整时重建。

    train_good_dir = dataset_root / "train" / "good"  # 训练正常目录。
    test_good_dir = dataset_root / "test" / "good"  # 测试正常目录。
    test_abnormal_dir = dataset_root / "test" / "abnormal"  # 测试异常目录。
    gt_abnormal_dir = dataset_root / "ground_truth" / "abnormal"  # 异常 mask 目录。

    if CLEAR_OLD_DATASET:
        reset_dir(dataset_root)  # 用户明确允许时才清空旧数据集。
    else:
        dataset_root.mkdir(parents=True, exist_ok=True)  # 不清空，只确保目录存在。

    good_images = collect_images(GOOD_DIR, allow_empty=False)  # 收集原始 good 图片。
    abnormal_images = collect_images(ABNORMAL_DIR, allow_empty=True)  # 收集原始 abnormal 图片。

    random.shuffle(good_images)  # 打乱 good 图片顺序。

    if len(good_images) == 1:
        train_good_images = good_images  # 只有一张 good 时只能训练。
        test_good_images = []  # 没有测试 good。
    else:
        train_num = int(len(good_images) * TRAIN_GOOD_RATIO)  # 计算训练数量。
        train_num = max(1, min(train_num, len(good_images) - 1))  # 保证训练和测试尽量都有图。
        train_good_images = good_images[:train_num]  # 划分训练 good。
        test_good_images = good_images[train_num:]  # 划分测试 good。

    for idx, img_path in enumerate(train_good_images):
        dst_path = train_good_dir / f"{idx:04d}{img_path.suffix.lower()}"  # 生成训练图片路径。
        safe_copy(img_path, dst_path)  # 复制到 train/good。

    for idx, img_path in enumerate(test_good_images):
        dst_path = test_good_dir / f"{idx:04d}{img_path.suffix.lower()}"  # 生成测试 good 路径。
        safe_copy(img_path, dst_path)  # 复制到 test/good。

    use_abnormal = len(abnormal_images) > 0  # 判断是否存在 abnormal。
    has_test_good = len(test_good_images) > 0  # 判断是否存在 test/good。

    if use_abnormal:
        for idx, img_path in enumerate(abnormal_images):
            dst_img_path = test_abnormal_dir / f"{idx:04d}{img_path.suffix.lower()}"  # 生成 abnormal 路径。
            safe_copy(img_path, dst_img_path)  # 复制 abnormal 图片。

            mask_path = gt_abnormal_dir / f"{idx:04d}.png"  # 生成 mask 路径。
            make_black_mask_like_image(img_path, mask_path)  # 生成全黑占位 mask。

    print("=" * 80)
    print("数据集整理完成。")
    print(f"数据集目录：{dataset_root}")
    print(f"good 总数：{len(good_images)}")
    print(f"abnormal 总数：{len(abnormal_images)}")
    print(f"train/good：{len(train_good_images)}")
    print(f"test/good：{len(test_good_images)}")
    print(f"test/abnormal：{len(abnormal_images)}")
    print(f"是否存在 test/good：{has_test_good}")
    print(f"是否存在 abnormal：{use_abnormal}")
    print("=" * 80)

    return dataset_root, has_test_good, use_abnormal  # 返回新整理的数据集状态。


# =========================
# 5. 构建 Anomalib 组件
# =========================

def build_engine() -> Engine:
    device = clean_path_str(TRAIN_DEVICE).lower()  # 统一设备字符串。

    if device == "gpu" and torch.cuda.is_available():
        print(f"训练设备：GPU，数量：{GPU_DEVICES}")  # 打印 GPU 信息。
        return Engine(max_epochs=MAX_EPOCHS, accelerator="gpu", devices=GPU_DEVICES)  # 创建 GPU 引擎。

    if device == "gpu" and not torch.cuda.is_available():
        print("警告：请求 GPU，但当前 PyTorch 不支持 CUDA，已切换为 CPU。")  # 提示 GPU 不可用。

    print("训练设备：CPU")  # 打印 CPU 信息。
    return Engine(max_epochs=MAX_EPOCHS, accelerator="cpu", devices=1)  # 创建 CPU 引擎。


def build_model() -> Patchcore:
    return Patchcore(
        backbone=BACKBONE,  # 设置骨干网络。
        layers=LAYERS,  # 设置特征层。
        pre_trained=PRE_TRAINED,  # 加载预训练权重。
        coreset_sampling_ratio=CORESET_SAMPLING_RATIO,  # 设置特征库采样比例。
        num_neighbors=NUM_NEIGHBORS,  # 设置最近邻数量。
        pre_processor=Patchcore.configure_pre_processor(image_size=IMAGE_SIZE),  # 设置输入预处理尺寸。
    )


def build_datamodule(dataset_root: Path, has_test_good: bool, use_abnormal: bool) -> Folder:
    kwargs = {
        "name": CATEGORY_NAME,  # 设置数据集名称。
        "root": str(dataset_root),  # 设置数据集根目录。
        "normal_dir": r"train\good",  # 设置训练 good 目录。
        "train_batch_size": TRAIN_BATCH_SIZE,  # 设置训练 batch。
        "eval_batch_size": EVAL_BATCH_SIZE,  # 设置测试 batch。
        "num_workers": NUM_WORKERS,  # 设置加载进程数。
    }

    if has_test_good:
        kwargs["normal_test_dir"] = r"test\good"  # 加入测试 good 目录。

    if use_abnormal:
        kwargs["abnormal_dir"] = r"test\abnormal"  # 加入测试 abnormal 目录。
        kwargs["mask_dir"] = r"ground_truth\abnormal"  # 加入 abnormal mask 目录。

    return Folder(**kwargs)  # 创建 Folder 数据模块。


# =========================
# 6. normal-only 误报检查
# =========================

def run_normal_only_check(engine: Engine, model: Patchcore, datamodule: Folder) -> None:
    print("=" * 80)  # 打印分隔线。
    print("开始 normal-only 误报检查。")  # 提示开始 normal-only 检查。
    print("说明：这里只有 test/good，没有 abnormal，所以不能计算完整异常检测指标。")  # 说明限制。

    try:
        predictions = engine.predict(model=model, datamodule=datamodule)  # 对 test/good 执行推理。
    except Exception as err:
        print(f"normal-only 推理失败，但不影响模型导出：{err}")  # 捕获版本兼容问题。
        return  # 推理失败时直接返回。

    scores = []  # 创建分数列表。

    for batch in predictions:
        if isinstance(batch, dict) and "pred_scores" in batch:
            score_tensor = batch["pred_scores"]  # 读取 pred_scores 字段。
            scores.extend(torch.as_tensor(score_tensor).detach().cpu().flatten().tolist())  # 保存分数。

    if len(scores) == 0:
        print("没有读取到 pred_scores，可能是当前 anomalib 版本输出字段不同。")  # 提示字段不兼容。
        return  # 没有分数时返回。

    scores_np = np.array(scores, dtype=np.float32)  # 转成 numpy 数组。
    print(f"test/good 图片数：{len(scores_np)}")  # 打印测试数量。
    print(f"正常图异常分数最小值：{scores_np.min():.6f}")  # 打印最小分数。
    print(f"正常图异常分数最大值：{scores_np.max():.6f}")  # 打印最大分数。
    print(f"正常图异常分数平均值：{scores_np.mean():.6f}")  # 打印平均分数。
    print(f"正常图异常分数中位数：{np.median(scores_np):.6f}")  # 打印中位数。
    print("建议：后续加入 abnormal 后再做正式阈值和检出率评估。")  # 给出建议。
    print("=" * 80)  # 打印分隔线。


# =========================
# 7. 训练、评估、导出
# =========================

def train_evaluate_export() -> None:
    dataset_root, has_test_good, use_abnormal = build_auto_dataset()  # 生成数据集。

    datamodule = build_datamodule(dataset_root, has_test_good, use_abnormal)  # 创建数据模块。
    model = build_model()  # 创建 PatchCore 模型。
    engine = build_engine()  # 创建训练引擎。

    print("=" * 80)  # 打印分隔线。
    print("开始训练 PatchCore。")  # 提示开始训练。
    print("=" * 80)  # 打印分隔线。

    engine.fit(datamodule=datamodule, model=model)  # 训练 PatchCore。

    if use_abnormal:
        print("=" * 80)  # 打印分隔线。
        print("开始完整异常检测评估。")  # 提示完整评估。
        print("说明：当前存在 test/good 和 test/abnormal，可以评估正常/异常区分能力。")  # 说明评估条件。
        print("=" * 80)  # 打印分隔线。
        engine.test(datamodule=datamodule, model=model)  # 执行完整评估。
    elif has_test_good and RUN_NORMAL_ONLY_CHECK:
        run_normal_only_check(engine, model, datamodule)  # 执行 normal-only 检查。
    else:
        print("跳过评估：当前没有 abnormal，且没有可用 test/good。")  # 提示跳过评估。

    EXPORT_ROOT.mkdir(parents=True, exist_ok=True)  # 创建导出目录。

    engine.export(
        model=model,
        export_type=ExportType.TORCH,
        export_root=str(EXPORT_ROOT),
    )  # 导出 Torch 模型。

    print("=" * 80)  # 打印分隔线。
    print("流程完成。")  # 提示完成。
    print(f"模型导出目录：{EXPORT_ROOT}")  # 打印模型导出路径。
    print("=" * 80)  # 打印分隔线。


if __name__ == "__main__":
    os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "TRUE")  # 临时规避部分 OpenMP 冲突。
    train_evaluate_export()  # 执行完整流程。

六、运行训练及评估

在 Anomalib 环境中执行：

python train_patchcore.py

训练结束后，通常会看到类似指标表：

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│        image_AUROC        │    0.8735449314117432     │
│       image_F1Score       │    0.6153846383094788     │
│        pixel_AUROC        │            0.0            │
│       pixel_F1Score       │            0.0            │
└───────────────────────────┴───────────────────────────┘

PatchCore 指标包括：

image_AUROC ：图像级异常区分能力，衡量这两类分数是否能拉开（最好 0.95 以上）
image_AP ：图像级平均精度，比 AUROC 更能反映模型对异常类的检出能力。
image_F1Score：图像级 F1，Precision 为模型判为异常的图里，有多少真异常

Recall 为真实异常图里，有多少被模型找出来，F1 即为 Precision 和 Recall 的综合。AUROC 可以高，但 F1Score 不一定高。

pixel_AUROC：像素级异常定位能力，比较模型输出的异常热力图和真实 mask。如果没有真实 mask，或者 mask 是全黑占位图，这个指标没有参考意义。
pixel_AP：像素级平均精度，用于衡量异常区域定位的 Precision-Recall 表现，也要真实 mask
pixel_F1Score：像素级 F1，用于衡量异常区域定位的 Precision-Recall 表现。

为什么 pixel 指标是 0 ？

如果结果中出现：

pixel_AUROC = 0.0
pixel_F1Score = 0.0

常见原因是没有提供真实 mask，或者 mask_dir 指向错误/名字不对应等等。

七、参考

ANOMALIB第一章：安装_anomalib 本地部署-CSDN博客

AtomGit开源社区

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念，把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起，为开发者提供从开发、训练到部署的一站式体验。

更多推荐

TCP 三次握手与四次挥手

本文深入解析了TCP协议中的三次握手与四次挥手机制，从报文结构、状态机转换到实战抓包分析。三次握手通过SYN、SYN-ACK、ACK报文建立双向连接，确保双方收发能力正常；四次挥手通过FIN、ACK报文独立关闭双向通道。文章还探讨了关键问题：为什么需要三次握手而非两次/四次、随机初始序列号的作用、SYN Flood攻击与防御、CLOSE_WAIT泄漏以及TIME_WAIT状态持续2MSL的原因。通

AtomGit开源社区

Agent Skills 完全指南：AI 编程助手的标准化“能力包”

本文介绍了AI编程助手（如Claude Code）中的Agent Skills机制，这是一种标准化、可复用的"能力包"系统。Skills不同于一次性指令，而是将复杂工作流程（如代码审查、自动部署）封装为可共享的操作手册。文章对比了Skills与Commands、Agents、Hooks的区别，详细解析了Skill的文件结构（SKILL.md为核心）和工作流程，并提供了Spring Boot自动部署