yolov26改进 | Conv/卷积篇 | 轻量化下采样操作ADown二次创新C3k2（参数量下降百分之二十，附手撕结构图）

Snu77

397人浏览 · 2026-05-08 22:08:31

Snu77 · 2026-05-08 22:08:31 发布

开始讲解之前推荐一下我的专栏，本专栏的内容支持(分类、检测、分割、追踪、关键点检测),专栏目前为限时折扣，欢迎大家订阅本专栏，本专栏每周更新5-7篇最新机制，更有包含我所有改进的文件和交流群提供给大家。

一、本文介绍

本文给大家带来的改进机制是利用ADown模块来改进我们的Conv模块，经过实验我发现该卷积模块（作为下采样模块）首先可以大幅度降低参数值，其次其精度上也有很高的提升，大家可以尝试以下在自己数据集上的效果，但该模型是由YOLOv9提出，这个结构效果很好，但是其原始是默认下采样特征图的大小，本文优化了其结构可以决定是否进行下采样，大家可以用于二次创新其它模块，本文用其二次创新C3k2模块。

欢迎大家订阅我的专栏一起学习YOLO！

专栏链接：YOLOv26有效涨点专栏包含：Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制

一、本文介绍

二、框架图

2.1 Programmable Gradient Information/可编程梯度信息

2.1.1 Auxiliary Reversible Branch/辅助可逆分支

2.1.2 Multi-level Auxiliary Information/多级辅助信息

2.2 Generalized ELAN

三、核心代码

四、手把手教你添加ADown机制

4.1 修改一

4.2 修改二

4.3 修改三

4.4 修改四

4.5 修改五

4.6 修改六

5.1 yaml文件

二、框架图

这张图（图3）展示了可编程梯度信息（PGI）及其相关网络架构和方法。图中展示了四种不同的网络设计：

a) PAN (Path Aggregation Network)：这种网络结构主要用于改进特征融合，以提高目标检测的性能。然而，由于信息瓶颈的存在，网络中可能会丢失一些信息。

b) RevCol (Reversible Columns)：这是一种旨在减少信息丢失的网络设计。它通过可逆的列结构来尝试维持信息流通不受损失，但如图中“Heavy Cost”所示，这种结构会增加计算成本。

c) 深度监督：这种方法通过在网络的多个层次中插入额外的监督信号来提高学习的效率和最终模型的性能。图中显示了通过深度监督连接的各个层。

d) 可编程梯度信息 (PGI)：PGI是作者提出的一种新方法（我理解的这种方法就是在前向传播的过程中没有跳级链接），它主要由三个部分组成：
1. 主分支：用于推理的架构。
2. 辅助可逆分支：生成可靠的梯度，以供给主分支进行反向传播。
3. 多级辅助信息：控制主分支学习可规划的多级语义信息。

PGI的目的是通过辅助可逆分支（如图中虚线框所示）来解决信息瓶颈问题，以便在不增加推理成本的情况下为深度网络提供更可靠的梯度。通过这种设计，即使是轻量级和浅层的神经网络也可以实现有效的信息保留和准确的梯度更新。如图中的深色方框所示的主分支，通过辅助可逆分支提供的可靠梯度信息，可以获得更有效的目标任务特征，而不会因为信息瓶颈而损失重要信息。

图中的符号代表不同的操作：灰色圆形代表池化操作，白色圆形代表上采样操作，灰色方块代表预测头，蓝色方块代表辅助分支，深色方块代表主分支。这种设计允许网络在保持高效计算的同时，也能够处理复杂的目标检测任务。

2.1 Programmable Gradient Information/可编程梯度信息

为了解决前述问题，我们提出了一种新的辅助监督框架，称为可编程梯度信息（PGI），如图3（d）所示。PGI主要包括三个部分，即（1）主分支，（2）辅助可逆分支和（3）多级辅助信息。从图3（d）我们可以看到，PGI的推理过程只使用主分支，因此不需要任何额外的推理成本。至于其他两个部分，它们用于解决或减缓深度学习方法中的几个重要问题。其中，辅助可逆分支旨在处理由神经网络加深造成的问题。网络加深将导致信息瓶颈，这将使得损失函数无法生成可靠的梯度。至于多级辅助信息，它旨在处理由深度监督造成的误差累积问题，特别是对于具有多个预测分支的架构和轻量型模型。接下来，我们将逐步介绍这两个部分。

2.1.1 Auxiliary Reversible Branch/辅助可逆分支

在PGI中，我们提出了辅助可逆分支来生成可靠的梯度并更新网络参数。通过提供从数据到目标的映射信息，损失函数可以提供指导，并避免从与目标关系较小的不完整前馈特征中找到错误相关性的可能性。我们提出通过引入可逆架构来维持完整信息，但在可逆架构中添加主分支将消耗大量的推理成本。我们分析了图3（b）的架构，并发现在深层到浅层添加额外连接时，推理时间将增加20%。当我们反复将输入数据添加到网络的高分辨率计算层（黄色框），推理时间甚至超过了两倍。

由于我们的目标是使用可逆架构来获取可靠的梯度，因此“可逆”并不是推理阶段的唯一必要条件。鉴于此，我们将可逆分支视为深度监督分支的扩展，并设计了如图3（d）所示的辅助可逆分支。至于主分支，由于信息瓶颈可能会丢失重要信息的深层特征，将能够从辅助可逆分支接收可靠的梯度信息。这些梯度信息将推动参数学习，以帮助提取正确和重要的信息，并使主分支能够获取更有效的目标任务特征。此外，由于复杂任务需要在更深的网络中进行转换，可逆架构在浅层网络上的表现不如在一般网络上。我们提出的方法不强迫主分支保留完整的原始信息，而是通过辅助监督机制生成有用的梯度来更新它。这种设计的优势是，所提出的方法也可以应用于较浅的网络。最后，由于辅助可逆分支可以在推理阶段移除，因此可以保留原始网络的推理能力。我们还可以在PGI中选择任何可逆架构来充当辅助可逆分支的角色。

2.1.2 Multi-level Auxiliary Information/多级辅助信息

在本节中，我们将讨论多级辅助信息是如何工作的。包含多个预测分支的深度监督架构如图3（c）所示。对于对象检测，可以使用不同的特征金字塔来执行不同的任务，例如它们可以一起检测不同大小的对象。因此，连接到深度监督分支后，浅层特征将被引导学习小对象检测所需的特征，此时系统将将其他大小的对象位置视为背景。然而，上述行为将导致深层特征金字塔丢失很多预测目标对象所需的信息。对于这个问题，我们认为每个特征金字塔都需要接收所有目标对象的信息，以便后续主分支能够保留完整信息来学习对各种目标的预测。

多级辅助信息的概念是在辅助监督的特征金字塔层之间和主分支之间插入一个集成网络，然后使用它来结合不同预测头返回的梯度，如图3（d）所示。然后，多级辅助信息将汇总包含所有目标对象的梯度信息，并将其传递给主分支然后更新参数。此时，主分支的特征金字塔层次的特性不会被某些特定对象的信息所主导。因此，我们的方法可以缓解深度监督中的断裂信息问题。此外，任何集成网络都可以在多级辅助信息中使用。因此，我们可以规划所需的语义级别来指导不同大小的网络架构的学习。

2.2 Generalized ELAN

在本节中，我们描述了提出的新网络架构 - GELAN。通过结合两种神经网络架构CSPNet和ELAN，这两种架构都是以梯度路径规划设计的，我们设计了考虑了轻量级、推理速度和准确性的广义高效层聚合网络（GELAN）。其整体架构如图4所示。我们推广了ELAN的能力，ELAN原本只使用卷积层的堆叠，到一个新的架构，可以使用任何计算块。

这张图（图4）展示了广义高效层聚合网络（GELAN）的架构，以及它是如何从CSPNet和ELAN这两种神经网络架构演变而来的。这两种架构都设计有梯度路径规划。

a) CSPNet：在CSPNet的架构中，输入通过一个转换层被分割为两部分，然后分别通过任意的计算块。之后，这些分支被重新合并（通过concatenation），并再次通过转换层。

b) ELAN：与CSPNet相比，ELAN采用了堆叠的卷积层，其中每一层的输出都会与下一层的输入相结合，再经过卷积处理。

c) GELAN：结合了CSPNet和ELAN的设计，提出了GELAN。它采用了CSPNet的分割和重组的概念，并在每一部分引入了ELAN的层级卷积处理方式。不同之处在于GELAN不仅使用卷积层，还可以使用任何计算块，使得网络更加灵活，能够根据不同的应用需求定制。

GELAN的设计考虑到了轻量化、推理速度和精确度，以此来提高模型的整体性能。图中显示的模块和分区的可选性进一步增加了网络的适应性和可定制性。GELAN的这种结构允许它支持多种类型的计算块，这使得它可以更好地适应各种不同的计算需求和硬件约束。

总的来说，GELAN的架构是为了提供一个更加通用和高效的网络，可以适应从轻量级到复杂的深度学习任务，同时保持或增强计算效率和性能。通过这种方式，GELAN旨在解决现有架构的限制，提供一个可扩展的解决方案，以适应未来深度学习的发展。

大家看图片一眼就能看出来它融合了什么，就是将CSPHet的anyBlock模块堆叠的方式和ELAN融合到了一起。

目前针对该结构并无原理介绍，下面的图片为我个人经过代码复现的结构图，结构上也是非常的简单。

三、核心代码

核心代码的使用方式看章节四！

import torch
import torch.nn as nn
import torch.nn.functional as F

__all__ = ['Snu77ADown', 'C3k2_ADown']

def autopad(k, p=None, d=1):  # kernel, padding, dilation
    # Pad to 'same' shape outputs
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p


class Conv(nn.Module):
    # Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)
    default_act = nn.SiLU()  # default activation

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        return self.act(self.conv(x))

class Snu77ADown(nn.Module):
    def __init__(self, c1, c2, downsample=False):
        super().__init__()
        self.downsample = downsample
        self.c = c2 // 2

        self.cv1 = Conv(c1 // 2, self.c, 3, s=2 if downsample else 1, p=1)
        self.cv2 = Conv(c1 // 2, self.c, 1, 1, 0)

    def forward(self, x):
        if self.downsample:
            x = F.avg_pool2d(x, 2, 1, 0, False, True)

        x1, x2 = x.chunk(2, 1)

        x1 = self.cv1(x1)

        if self.downsample:
            x2 = F.max_pool2d(x2, 3, 2, 1)
        else:
            x2 = F.max_pool2d(x2, 3, 1, 1)

        x2 = self.cv2(x2)

        return torch.cat((x1, x2), 1)


class Bottleneck_ADown(nn.Module):
    # Standard bottleneck with DCN
    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):  # ch_in, ch_out, shortcut, groups, kernels, expand
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, k[0], 1)
        self.cv2 = Snu77ADown(c_, c2, False)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))


class Bottleneck(nn.Module):
    """Standard bottleneck."""

    def __init__(
        self, c1: int, c2: int, shortcut: bool = True, g: int = 1, k: tuple[int, int] = (3, 3), e: float = 0.5
    ):
        """Initialize a standard bottleneck module.

        Args:
            c1 (int): Input channels.
            c2 (int): Output channels.
            shortcut (bool): Whether to use shortcut connection.
            g (int): Groups for convolutions.
            k (tuple): Kernel sizes for convolutions.
            e (float): Expansion ratio.
        """
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, k[0], 1)
        self.cv2 = Conv(c_, c2, k[1], 1, g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Apply bottleneck with optional shortcut connection."""
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))


class C2f(nn.Module):
    """Faster Implementation of CSP Bottleneck with 2 convolutions."""

    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
        """Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing."""
        super().__init__()
        self.c = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))

    def forward(self, x):
        """Forward pass through C2f layer."""
        y = list(self.cv1(x).chunk(2, 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

    def forward_split(self, x):
        """Forward pass using split() instead of chunk()."""
        y = list(self.cv1(x).split((self.c, self.c), 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

class C3(nn.Module):
    """CSP Bottleneck with 3 convolutions."""

    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
        """Initialize the CSP Bottleneck with given channels, number, shortcut, groups, and expansion values."""
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))

    def forward(self, x):
        """Forward pass through the CSP bottleneck with 2 convolutions."""
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))

class C3k(C3):
    """C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""

    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
        """Initializes the C3k module with specified channels, number of layers, and configurations."""
        super().__init__(c1, c2, n, shortcut, g, e)
        c_ = int(c2 * e)  # hidden channels
        # self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))

class Attention_YOLOv26(nn.Module):
    """Attention module that performs self-attention on the input tensor.

    Args:
        dim (int): The input tensor dimension.
        num_heads (int): The number of attention heads.
        attn_ratio (float): The ratio of the attention key dimension to the head dimension.

    Attributes:
        num_heads (int): The number of attention heads.
        head_dim (int): The dimension of each attention head.
        key_dim (int): The dimension of the attention key.
        scale (float): The scaling factor for the attention scores.
        qkv (Conv): Convolutional layer for computing the query, key, and value.
        proj (Conv): Convolutional layer for projecting the attended values.
        pe (Conv): Convolutional layer for positional encoding.
    """

    def __init__(self, dim: int, num_heads: int = 8, attn_ratio: float = 0.5):
        """Initialize multi-head attention module.

        Args:
            dim (int): Input dimension.
            num_heads (int): Number of attention heads.
            attn_ratio (float): Attention ratio for key dimension.
        """
        super().__init__()
        self.num_heads = num_heads
        self.head_dim = dim // num_heads
        self.key_dim = int(self.head_dim * attn_ratio)
        self.scale = self.key_dim**-0.5
        nh_kd = self.key_dim * num_heads
        h = dim + nh_kd * 2
        self.qkv = Conv(dim, h, 1, act=False)
        self.proj = Conv(dim, dim, 1, act=False)
        self.pe = Conv(dim, dim, 3, 1, g=dim, act=False)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Forward pass of the Attention module.

        Args:
            x (torch.Tensor): The input tensor.

        Returns:
            (torch.Tensor): The output tensor after self-attention.
        """
        B, C, H, W = x.shape
        N = H * W
        qkv = self.qkv(x)
        q, k, v = qkv.view(B, self.num_heads, self.key_dim * 2 + self.head_dim, N).split(
            [self.key_dim, self.key_dim, self.head_dim], dim=2
        )

        attn = (q.transpose(-2, -1) @ k) * self.scale
        attn = attn.softmax(dim=-1)
        x = (v @ attn.transpose(-2, -1)).view(B, C, H, W) + self.pe(v.reshape(B, C, H, W))
        x = self.proj(x)
        return x


class PSABlock(nn.Module):
    """PSABlock class implementing a Position-Sensitive Attention block for neural networks.

    This class encapsulates the functionality for applying multi-head attention and feed-forward neural network layers
    with optional shortcut connections.

    Attributes:
        attn (Attention): Multi-head attention module.
        ffn (nn.Sequential): Feed-forward neural network module.
        add (bool): Flag indicating whether to add shortcut connections.

    Methods:
        forward: Performs a forward pass through the PSABlock, applying attention and feed-forward layers.

    """

    def __init__(self, c: int, attn_ratio: float = 0.5, num_heads: int = 4, shortcut: bool = True) -> None:
        """Initialize the PSABlock.

        Args:
            c (int): Input and output channels.
            attn_ratio (float): Attention ratio for key dimension.
            num_heads (int): Number of attention heads.
            shortcut (bool): Whether to use shortcut connections.
        """
        super().__init__()

        self.attn = Attention_YOLOv26(c, attn_ratio=attn_ratio, num_heads=num_heads)
        self.ffn = nn.Sequential(Conv(c, c * 2, 1), Conv(c * 2, c, 1, act=False))
        self.add = shortcut

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Execute a forward pass through PSABlock.

        Args:
            x (torch.Tensor): Input tensor.

        Returns:
            (torch.Tensor): Output tensor after attention and feed-forward processing.
        """
        x = x + self.attn(x) if self.add else self.attn(x)
        x = x + self.ffn(x) if self.add else self.ffn(x)
        return x

class C3k_ADown(C3):
    """C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""

    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
        """Initializes the C3k module with specified channels, number of layers, and configurations."""
        super().__init__(c1, c2, n, shortcut, g, e)
        c_ = int(c2 * e)  # hidden channels
        # self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
        self.m = nn.Sequential(*(Bottleneck_ADown(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))


class C3k2_ADown(C2f):
    """Faster Implementation of CSP Bottleneck with 2 convolutions."""

    def __init__(
        self,
        c1: int,
        c2: int,
        n: int = 1,
        c3k: bool = False,
        e: float = 0.5,
        attn: bool = False,
        g: int = 1,
        shortcut: bool = True,
    ):
        """Initialize C3k2 modu
        Args:
            c1 (int): Input channels.
            c2 (int): Output channels.
            n (int): Number of blocks.
            c3k (bool): Whether to use C3k blocks.
            e (float): Expansion ratio.
            attn (bool): Whether to use attention blocks.
            g (int): Groups for convolutions.
            shortcut (bool): Whether to use shortcut connections.
        """
        super().__init__(c1, c2, n, shortcut, g, e)
        self.m = nn.ModuleList(
            nn.Sequential(
                Bottleneck_ADown(self.c, self.c, shortcut, g),
                PSABlock(self.c, attn_ratio=0.5, num_heads=max(self.c // 64, 1)),
            )
            if attn
            else C3k_ADown(self.c, self.c, 2, shortcut, g)
            if c3k
            else Bottleneck_ADown(self.c, self.c, shortcut, g)
            for _ in range(n)
        )

if __name__ == '__main__':
    x = torch.randn(1, 32, 16, 16)
    model = C3k2_ADown(32, 32,2,True,0.25, True)
    print(model(x).shape)

四、手把手教你添加ADown机制

下面的步骤如果你不会或者不想麻烦操作，可以联系作者获得本专栏添加所有项目文件的源代码，可直接训练.

4.1 修改一

第一还是建立文件，我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹！

4.2 修改二

然后在Addmodules文件夹内建立一个新的py文件，将本文章节三中的“核心代码"复制粘贴进去。

4.3 修改三

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'，然后在其内部导入我们的文件，如下图所示。

4.4 修改四

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块(此处只需要添加一次即可，如果你用我其它的改进机制这里的步骤只需要添加一次)！

4.5 修改五

在'ultralytics/nn/tasks.py'文件内的parse_model方法函数内（位置大概在1500+行左右），按照图示位置添加即可（此处需要自己有一定的判别能力，如果不会可联系作者获得视频教程）。

4.6 修改六

在'ultralytics/nn/tasks.py'文件内的parse_model方法函数内（位置大概在1600+行左右），按照图示位置进行代码的替换即可（此处不改如果你yaml文件中的所有C3k2都被改名了，则检测头会使用老版本的v8检测头参数量会大幅度增加，但不影响运行很多人都忽略了这一步）。

            if "C3k2" in getattr(m, "__name__", str(m)):
                legacy = False
                if scale in "mlx":
                    args[3] = True

到此就修改完成了，大家可以复制下面的yaml文件运行，更多使用方式可以联系作者获得使用视频，本文仅列出常见的使用方式。。

五、正式训练

5.1 yaml文件

5.1.1 yaml文件1

训练信息：YOLO26-ADown summary: 273 layers, 2,089,720 parameters, 2,089,720 gradients, 4.9 GFLOPs

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

# Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs
# Model docs: https://docs.ultralytics.com/models/yolo26
# Task docs: https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
end2end: True # whether to use end-to-end mode
reg_max: 1 # DFL bins
scales: # model compound scaling constants, i.e. 'model=yolo26n.yaml' will call yolo26.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs
  m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs
  l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs
  x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs

# YOLO26n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Snu77ADown, [128, True]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Snu77ADown, [256, True]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Snu77ADown, [512, True]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Snu77ADown, [1024, True]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, SPPF, [1024, 5, 3, True]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO26n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2, [512, True]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small)

#  - [-1, 1, Conv, [256, 3, 2]]
  - [-1, 1, Snu77ADown, [256, True]]  # 和上面一层二选一.
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium)

#  - [-1, 1, Conv, [512, 3, 2]]
  - [-1, 1, Snu77ADown, [512, True]]  # 和上面一层二选一.
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)

5.1.2 yaml文件2

训练信息：YOLO26-C3k2-ADown-1 summary: 273 layers, 2,501,560 parameters, 2,501,560 gradients, 5.9 GFLOPs

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

# Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs
# Model docs: https://docs.ultralytics.com/models/yolo26
# Task docs: https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
end2end: True # whether to use end-to-end mode
reg_max: 1 # DFL bins
scales: # model compound scaling constants, i.e. 'model=yolo26n.yaml' will call yolo26.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs
  m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs
  l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs
  x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs

# YOLO26n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2_ADown, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2_ADown, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2_ADown, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2_ADown, [1024, True]]
  - [-1, 1, SPPF, [1024, 5, 3, True]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO26n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2, [512, True]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)

5.1.3 yaml文件3

训练信息：YOLO26-C3k2-ADown-2 summary: 275 layers, 2,489,080 parameters, 2,489,080 gradients, 5.9 GFLOPs

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

# Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs
# Model docs: https://docs.ultralytics.com/models/yolo26
# Task docs: https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
end2end: True # whether to use end-to-end mode
reg_max: 1 # DFL bins
scales: # model compound scaling constants, i.e. 'model=yolo26n.yaml' will call yolo26.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs
  m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs
  l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs
  x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs

# YOLO26n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, SPPF, [1024, 5, 3, True]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO26n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2_ADown, [512, True]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2_ADown, [256, True]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2_ADown, [512, True]] # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 1, C3k2_ADown, [1024, True, 0.5, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)

5.1.4 yaml文件4

训练信息：YOLO26-C3k2-ADown-3 summary: 263 layers, 3,283,864 parameters, 3,283,864 gradients, 9.3 GFLOPs

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

# Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs
# Model docs: https://docs.ultralytics.com/models/yolo26
# Task docs: https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
end2end: True # whether to use end-to-end mode
reg_max: 1 # DFL bins
scales: # model compound scaling constants, i.e. 'model=yolo26n.yaml' will call yolo26.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs
  m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs
  l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs
  x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs

# YOLO26n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2_ADown, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2_ADown, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2_ADown, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2_ADown, [1024, True]]
  - [-1, 1, SPPF, [1024, 5, 3, True]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO26n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2_ADown, [512, True]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2_ADown, [256, True]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2_ADown, [512, True]] # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 1, C3k2_ADown, [1024, True, 0.5, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)

5.2 训练代码

大家可以创建一个py文件将我给的代码复制粘贴进去，配置好自己的文件路径即可运行。

import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO('模型配置文件地址,也就是5.1你保存到本地文件的地址')
    # 如何切换模型版本, 上面的ymal文件可以改为 yolo26s.yaml就是使用的26s,
    # 类似某个改进的yaml文件名称为yolo26-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolo26l-XXX.yaml即可（改的是上面YOLO中间的名字不是配置文件的）！
    # model.load('yolo26n.pt') # 是否加载预训练权重,科研不建议大家加载否则很难提升精度
    model.train(
        data=r"数据集文件地址",
                # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
                cache=False,
                imgsz=640,
                epochs=20,
                single_cls=False,  # 是否是单类别检测
                batch=16,
                close_mosaic=0,
                workers=0,
                device='0',
                optimizer='MuSGD', # using SGD/MuSGD
                # resume=, # 这里是填写last.pt地址
                amp=True,  # 如果出现训练损失为Nan可以关闭amp
                project='runs/train',
                name='exp',
                )

5.3 训练过程截图

五、本文总结

到此本文的正式分享内容就结束了，在这里给大家推荐我的YOLOv26改进有效涨点专栏，本专栏目前为新开的平均质量分98分，后期我会根据各种最新的前沿顶会进行论文复现，也会对一些老的改进机制进行补充，如果大家觉得本文帮助到你了，订阅本专栏，关注后续更多的更新~

专栏链接：YOLOv26有效涨点专栏包含：Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制

AtomGit开源社区

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念，把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起，为开发者提供从开发、训练到部署的一站式体验。

更多推荐

GitHub 开源光谱数据处理项目推荐

AtomGit开源社区

微软 BitNet 在 x86/ARM CPU 上实现 2–6 倍推理加速、70–80%+ 能耗下降，并可在单颗 CPU 上运行 100B 参数 BitNet b1.58 模型

微软推出的BitNet b1.58是一种革命性的1.58比特大语言模型架构，通过三值量化将权重压缩至{-1,0,+1}，结合8比特整数激活，在几乎保持任务性能的同时，使大模型能在CPU和边缘设备上高效运行。其核心优势包括：10倍权重压缩、70-80%能耗降低、支持x86/ARM架构CPU原生推理。官方开源了bitnet.cpp推理框架，优化了专用内核，在单CPU上即可运行100B参数模型。目前已发