【YOLOv5-6.x】模型参数量param及计算量FLOPs解析

文章目录前言参数量param和计算量FLOPs简介参数量计算量YOLOv5计算模型参数训练和验证输出模型参数不同的原因分析输出模型参数结果（以YOLOv5s-coco2017为例）参数不同的原因分析Reference前言评价一个用深度学习框架搭建的神经网络模型，除了精确度（比如目标检测中常用的map）指标之外，模型复杂度也必须要考虑，通常用正向推理的计算量(FLOPs)和参数个数(Paramete

文章共6,198字 · 阅读需要大约21分钟

一键AI生成摘要，助你高效阅读

问答

嗜睡的篠龙

35114人浏览 · 2022-04-02 10:39:14

嗜睡的篠龙 · 2022-04-02 10:39:14 发布

文章目录

前言

评价一个用深度学习框架搭建的神经网络模型，除了精确度（比如目标检测中常用的map）指标之外，模型复杂度也必须要考虑，通常用正向推理的计算量(FLOPs)和参数个数(Parameters)来描述模型的复杂度。

参数量param和计算量FLOPs简介

参数量

有参数的层主要包括：
- 卷积层
- 全连接层
- BN层
- Embedding层
- 少数激活函数层（AconC）
- … …
无参数层：
- 多数激活函数层（Sigmoid/ReLU/SiLU/Mish）
- 池化层
- Dropout层
- … …
更具体的来说，模型的参数数目（不考虑偏置项b）为：
- 全连接Linear(M->N)参数： $M \times N$
- 卷积Conv2d(Cin, Cout, K)参数： $C_{in}×C_{out}×K×K$
- BatchNorm(N)参数： $2 N$
- Embedding(N,W)参数： $N \times W$
- … …

YOLOv5s模型的参数信息如下：

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Conv                      [3, 32, 6, 2, 2]              
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  2    115712  models.common.C3                        [128, 128, 2]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  3    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1   1182720  models.common.C3                        [512, 512, 1]                 
  9                -1  1    656896  models.common.SPPF                      [512, 512, 5]                 
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]

计算量

FLOPS
- 注意全是大写
- floating point operations per second，每秒浮点运算次数，可以理解为计算速度
- 是一个衡量硬件性能的指标
FLOPs
- 注意s是小写
- floating point operations的缩写（s表复数），浮点运算数，可以理解为计算量
- 计算公式（卷积层）： $FLOPs=2×C_{out}×H_{out}×W_{out}×C_{in}×k^2$
  - 注意，公式中没有包括偏置项bias，如果要计算包括bias的FLOPs，可以用下面这个公式：
  - $FLOPs=2×C_{out}×H_{out}×W_{out}×(C_{in}×k^2+bias)$
- 可以用来衡量算法/模型的复杂度
GFLOPs
- paper中常见
- $1GFLOPs=10^9FLOPs$
- 10亿次浮点运算量

YOLOv5计算模型参数

YOLOv5-6.x版本中，torch_utils.py文件中的model_info函数，负责计算并打印模型的参数信息（parameter、grandient、FLOPs）：

# 打印模型的参数、FLOPs等信息
def model_info(model, verbose=False, img_size=640):
    # Model information. img_size may be int or list, i.e. img_size=640 or img_size=[640, 320]
    n_p = sum(x.numel() for x in model.parameters())  # number parameters
    # 训练时: parameters = gradients, 验证时: gradients = 0
    n_g = sum(x.numel() for x in model.parameters() if x.requires_grad)  # number gradients
    if verbose:
        print(f"{'layer':>5} {'name':>40} {'gradient':>9} {'parameters':>12} {'shape':>20} {'mu':>10} {'sigma':>10}")
        for i, (name, p) in enumerate(model.named_parameters()):
            name = name.replace('module_list.', '')
            print('%5g %40s %9s %12g %20s %10.3g %10.3g' %
                  (i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std()))
    # 调用thop库中的profile计算FLOPs
    try:  # FLOPs
        from thop import profile
        stride = max(int(model.stride.max()), 32) if hasattr(model, 'stride') else 32
        # input
        # img = torch.zeros((1, model.yaml.get('ch', 3), stride * 8, stride * 8), device=next(model.parameters()).device)  # 帮助理解如何计算FLOPs的尝试
        img = torch.zeros((1, model.yaml.get('ch', 3), stride, stride), device=next(model.parameters()).device)  # input
        flops = profile(deepcopy(model), inputs=(img,), verbose=False)[0] / 1E9 * 2  # stride GFLOPs
        img_size = img_size if isinstance(img_size, list) else [img_size, img_size]  # expand if int/float
        fs = ', %.1f GFLOPs' % (flops * img_size[0] / stride * img_size[1] / stride)  # 640x640 GFLOPs
        # fs = ', %.1f GFLOPs' % (flops * img_size[0] / (stride * 8) * img_size[1] / (stride * 8))  # 640x640 GFLOPs  # 帮助理解如何计算FLOPs的尝试
    except (ImportError, Exception):
        fs = ''

    LOGGER.info(f"Model Summary: {len(list(model.modules()))} layers, {n_p} parameters, {n_g} gradients{fs}")

训练和验证输出模型参数不同的原因分析

输出模型参数结果（以YOLOv5s-coco2017为例）

训练时输出模型参数：

Model Summary: 270 layers, 7235389 parameters, 7235389 gradients, 16.5 GFLOPs

验证时输出模型参数：

Fusing layers... 
Model Summary: 213 layers, 7225885 parameters, 0 gradients

参数不同的原因分析

layers
- 可以看到验证时网络层数减少了很多
- 其中一个原因是使用了Fuse前向加速推理方法，将Conv和BN层融合在一起，具体见torch_utils.py文件中的fuse_conv_and_bn函数
- 其他原因目前还没有看出来，欢迎大家在评论区交流看法~
parameters
- 原因也是使用了Fuse前向加速推理方法，将Conv和BN层融合在一起，相当于砍掉了BN层
grandients
- 训练时所有参数都需要求梯度进行反向传播，所以训练时gradients = parameters
- 验证时，由于加载的是训练好的权重文件，参数不需要更新，所以不需要求梯度，因此gradients = 0