欢迎来到《计算机视觉实战》系列教程的第七章。在第六章我们学习了PyTorch基础和简单的神经网络构建,本章我们将深入学习卷积神经网络(Convolutional Neural Network,CNN),这是计算机视觉领域最核心的深度学习模型。

从2012年AlexNet在ImageNet竞赛中取得突破性成绩开始,CNN就成为了图像识别、目标检测、语义分割等视觉任务的基础架构。即使后来出现了Transformer等新架构,CNN因其高效性和对多尺度特征的良好捕捉能力,仍然是不可或缺的基础模块。


1. 环境声明

  • Python版本Python 3.12+
  • PyTorch版本PyTorch 2.2+
  • torchvision版本0.17+
  • NumPy版本1.26+
  • matplotlib版本3.8+
  • CUDA版本12.1+(强烈建议使用GPU)

2. 卷积操作详解

2.1 从全连接层到卷积层

在传统神经网络(如MLP)中,每个神经元与上一层的所有神经元相连。这种"全连接"的结构在处理高维图像数据时存在严重问题:

参数爆炸:一张224x224x3的RGB图像,如果第一个隐藏层有1000个神经元,就需要224x224x3x1000 ≈ 1.5亿个参数。

空间信息丢失:将图像展平成一维向量,完全丢失了像素间的空间关系。

卷积层的核心思想

  • 局部连接(Local Connectivity):每个神经元只与输入的一个局部区域相连
  • 权重共享(Weight Sharing):同一个卷积核在整个输入上共享参数
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# 对比全连接层和卷积层的参数数量

# 全连接层
fc = nn.Linear(in_features=224*224*3, out_features=1000)
fc_params = sum(p.numel() for p in fc.parameters())
print(f"全连接层参数: {fc_params:,} (约{fc_params/1e6:.1f}百万)")

# 卷积层
conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1)
conv_params = sum(p.numel() for p in conv.parameters())
print(f"卷积层参数 (64@3x3): {conv_params:,}")

# 再加几层卷积
conv2 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
total_conv_params = sum(p.numel() for p in conv.parameters()) + \
                    sum(p.numel() for p in conv2.parameters())
print(f"两层卷积参数: {total_conv_params:,}")

print(f"\n参数节省比例: {fc_params/total_conv_params:.0f}x")

2.2 卷积操作原理

卷积核(Kernel/Filter):一个小的矩阵(如3x3、5x5),包含要学习的权重参数。

卷积过程

  1. 卷积核在输入图像上滑动(从左上角到右下角)
  2. 在每个位置,计算卷积核与输入局部区域的元素乘积之和
  3. 输出称为特征图(Feature Map)

关键参数

  • Kernel Size(卷积核大小):通常为3x3、5x5、7x7
  • Stride(步长):卷积核每次移动的像素数
  • Padding(填充):在输入边缘添加的像素数
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

def manual_conv2d(input_tensor, kernel, stride=1, padding=0):
    """
    手动实现2D卷积
    input_tensor: (C, H, W)
    kernel: (C, kH, kW)
    """
    C, H, W = input_tensor.shape
    kH, kW = kernel.shape[1:]

    # 添加padding
    if padding > 0:
        pad_input = torch.zeros(C, H + 2*padding, W + 2*padding)
        pad_input[:, padding:padding+H, padding:padding+W] = input_tensor
    else:
        pad_input = input_tensor

    # 计算输出大小
    out_H = (H + 2*padding - kH) // stride + 1
    out_W = (W + 2*padding - kW) // stride + 1

    # 初始化输出
    output = torch.zeros(out_H, out_W)

    # 滑动计算
    for i in range(out_H):
        for j in range(out_W):
            h_start = i * stride
            w_start = j * stride
            region = pad_input[:, h_start:h_start+kH, w_start:w_start+kW]
            output[i, j] = (region * kernel).sum()

    return output

# 创建输入和卷积核
input_tensor = torch.randn(1, 5, 5)  # 单通道,5x5
kernel = torch.tensor([[1, 0, -1],
                       [2, 0, -2],
                       [1, 0, -1]], dtype=torch.float32).unsqueeze(0)  # 边缘检测

# 手动计算
manual_result = manual_conv2d(input_tensor[0], kernel[0], stride=1, padding=0)

# 使用PyTorch验证
conv = nn.Conv2d(1, 1, kernel_size=3, bias=False)
conv.weight.data = kernel
output = conv(input_tensor)

print("手动计算结果:")
print(manual_result)
print("\nPyTorch卷积结果:")
print(output[0])
print(f"\n差异: {(manual_result - output[0]).abs().max():.6f}")

2.3 卷积层参数详解

import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# 不同的卷积配置
print("卷积层参数计算:")
print("=" * 50)

# 示例1:标准卷积
conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1)
params1 = sum(p.numel() for p in conv1.parameters())
print(f"Conv(3->64, 3x3, pad=1)")
print(f"  参数数量: {params1:,}")
print(f"  计算: 3x3x3 + 64 = {3*3*3 + 64}")

# 示例2:无padding的卷积
conv2 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=0)
params2 = sum(p.numel() for p in conv2.parameters())
print(f"\nConv(3->64, 3x3, pad=0)")
print(f"  参数数量: {params2:,}")
print(f"  输出尺寸: (H-2) x (W-2)")

# 示例3:大卷积核
conv3 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, padding=3)
params3 = sum(p.numel() for p in conv3.parameters())
print(f"\nConv(3->64, 7x7, pad=3)")
print(f"  参数数量: {params3:,}")
print(f"  计算: 7x7x3 + 64 = {7*7*3 + 64}")

# 示例4:空洞卷积
conv4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding=2, dilation=2)
params4 = sum(p.numel() for p in conv4.parameters())
print(f"\nConv(64->64, 3x3, pad=2, dilation=2)")
print(f"  参数数量: {params4:,}")
print(f"  感受野: 5x5 (等效于5x5卷积核)")

# 验证卷积输出尺寸公式
print("\n" + "=" * 50)
print("卷积输出尺寸公式:")
print("output_size = floor((input_size + 2*padding - kernel_size) / stride) + 1")

# 实际测试
test_input = torch.randn(1, 3, 32, 32)
print(f"\n输入: {test_input.shape}")

test_conv = nn.Conv2d(3, 64, kernel_size=3, padding=1, stride=1)
output = test_conv(test_input)
print(f"Conv(3->64, 3x3, pad=1, stride=1): {output.shape}")

test_conv_stride2 = nn.Conv2d(3, 64, kernel_size=3, padding=1, stride=2)
output_stride2 = test_conv_stride2(test_input)
print(f"Conv(3->64, 3x3, pad=1, stride=2): {output_stride2.shape}")

2.4 多通道卷积

import torch
import torch.nn as nn

# 多通道输入,多通道输出
# 输入: (batch, C_in, H, W)
# 输出: (batch, C_out, H', W')

input_tensor = torch.randn(1, 3, 8, 8)  # 3通道输入
print(f"输入形状: {input_tensor.shape}")

# 创建多通道卷积层
conv = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3, padding=1)
output = conv(input_tensor)
print(f"输出形状: {output.shape}")

# 查看权重形状
print(f"\n卷积权重形状: {conv.weight.shape}")  # [8, 3, 3, 3] - 8个输出通道,每个需要3个输入通道的权重
print(f"偏置形状: {conv.bias.shape}")  # [8] - 每个输出通道一个偏置

# 每个输出通道是所有输入通道的卷积结果之和
print("\n每个输出通道 = Σ(输入通道_i × 卷积核_i) + 偏置")
print(f"因此参数数量 = C_out × C_in × K × K + C_out")
print(f"           = 8 × 3 × 3 × 3 + 8 = {8*3*3*3 + 8}")

3. 池化层

3.1 池化操作原理

池化层(Pooling Layer)是CNN中另一个重要组件,它的主要作用是:

  1. 降低特征图尺寸:减少计算量和参数量
  2. 增加感受野:让后续卷积能看到更大的图像区域
  3. 提供平移不变性:轻微的平移不会改变池化结果

最大池化(Max Pooling):保留区域内的最大值
平均池化(Average Pooling):保留区域内的平均值

import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# 最大池化
maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
# 平均池化
avgpool = nn.AvgPool2d(kernel_size=2, stride=2)

# 创建测试输入
input_tensor = torch.arange(1, 17, dtype=torch.float32).view(1, 1, 4, 4)
print("输入特征图:")
print(input_tensor[0, 0])

# 池化操作
output_max = maxpool(input_tensor)
output_avg = avgpool(input_tensor)

print("\n最大池化 (2x2, stride=2):")
print(output_max[0, 0])

print("\n平均池化 (2x2, stride=2):")
print(output_avg[0, 0])

# 验证输出尺寸
print("\n池化输出尺寸公式:")
print("output_size = floor(input_size / stride)")
print(f"输入: 4x4, 输出: {output_max.shape[2]}x{output_max.shape[3]}")

# 全局平均池化
gap = nn.AdaptiveAvgPool2d((1, 1))
global_pool = gap(input_tensor)
print(f"\n全局平均池化: {input_tensor.shape} -> {global_pool.shape}")

3.2 池化vs步长卷积

import torch
import torch.nn as nn

# 两种方式都可以降低分辨率

# 方式1:池化 + 卷积
pool_conv = nn.Sequential(
    nn.MaxPool2d(kernel_size=2, stride=2),  # 32x32 -> 16x16
    nn.Conv2d(3, 64, kernel_size=3, padding=1)  # 保持16x16
)

# 方式2:步长卷积
stride_conv = nn.Conv2d(3, 64, kernel_size=3, padding=1, stride=2)  # 32x32 -> 16x16

# 对比
x = torch.randn(1, 3, 32, 32)
out1 = pool_conv(x)
out2 = stride_conv(x)

print(f"输入: {x.shape}")
print(f"池化+卷积: {out1.shape}")
print(f"步长卷积: {out2.shape}")

# 哪种更好?
print("\n对比:")
print("池化+卷积: 更早降采样,参数更多,计算更慢")
print("步长卷积: 更晚降采样,参数更少,计算更快(现代网络更常用)")

4. 感受野

4.1 感受野的概念

**感受野(Receptive Field)**是指输出特征图上的一个像素对应输入图像的区域大小。理解感受野对于设计网络和调试模型至关重要。

import torch
import torch.nn as nn

def calculate_receptive_field(layers_info):
    """
    计算感受野
    layers_info: list of tuples (kernel_size, stride)
    """
    rf = 1  # 初始感受野
    stride_accum = 1

    print("逐层感受野计算:")
    print("-" * 50)

    for i, (ks, s) in enumerate(layers_info):
        new_rf = rf + (ks - 1) * stride_accum
        print(f"Layer {i+1}: kernel={ks}x{ks}, stride={s}")
        print(f"  RF_before={rf}, stride_accum={stride_accum}")
        print(f"  RF_after = {rf} + ({ks}-1) * {stride_accum} = {new_rf}")
        rf = new_rf
        stride_accum *= s

    print("-" * 50)
    return rf

# 示例:两层3x3卷积
print("两层3x3卷积 (stride=1):")
rf1 = calculate_receptive_field([(3, 1), (3, 1)])
print(f"最终感受野: {rf1}x{rf1}")
print("注意:两层3x3卷积的感受野等于一个5x5卷积!")

print("\n" + "=" * 50)

# 示例:VGG风格网络
print("\nVGG风格网络:")
rf_vgg = calculate_receptive_field([(3, 1), (3, 1), (3, 1)])  # conv1
print(f"第一个block后: {rf_vgg}x{rf_vgg}")

rf_vgg2 = calculate_receptive_field([(3, 1), (3, 1), (3, 1),
                                      (3, 2), (3, 1), (3, 1)])  # + pool
print(f"第二个block后: {rf_vgg2}x{rf_vgg2}")

4.2 有效感受野

import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# 可视化感受野

def visualize_receptive_field():
    """可视化CNN各层的有效感受野"""

    # 创建一个简单的CNN
    model = nn.Sequential(
        nn.Conv2d(3, 32, 3, padding=1),  # conv1
        nn.ReLU(),
        nn.Conv2d(32, 64, 3, padding=1),  # conv2
        nn.ReLU(),
        nn.MaxPool2d(2, 2),  # pool1
        nn.Conv2d(64, 128, 3, padding=1),  # conv3
        nn.ReLU(),
        nn.Conv2d(128, 128, 3, padding=1),  # conv4
        nn.ReLU(),
        nn.AdaptiveAvgPool2d(1)  # GAP
    )

    # 打印网络结构
    print("网络结构:")
    print("=" * 50)
    for i, layer in enumerate(model):
        if isinstance(layer, nn.Conv2d):
            print(f"Layer {i}: Conv2d({layer.in_channels}->{layer.out_channels}, "
                  f"kernel={layer.kernel_size[0]}, stride={layer.stride[0]}, "
                  f"padding={layer.padding[0]})")
        elif isinstance(layer, nn.MaxPool2d):
            print(f"Layer {i}: MaxPool2d(kernel={layer.kernel_size}, stride={layer.stride})")
        elif isinstance(layer, nn.ReLU):
            print(f"Layer {i}: ReLU")
        elif isinstance(layer, nn.AdaptiveAvgPool2d):
            print(f"Layer {i}: AdaptiveAvgPool2d(1)")

    print("\n感受野计算:")
    print("=" * 50)
    print("conv1 (3x3, s=1): RF = 3")
    print("conv2 (3x3, s=1): RF = 3 + (3-1)*1 = 5")
    print("pool1 (2x2, s=2): RF = 5 + (2-1)*1*1 = 6 (但stride累积)")
    print("                  实际上 stride=2, 所以输入范围 = 6")
    print("conv3 (3x3, s=1): RF = 6 + (3-1)*2 = 10")
    print("conv4 (3x3, s=1): RF = 10 + (3-1)*2 = 14")
    print("GAP: 整个特征图都被平均池化")

    print("\n结论:")
    print("最后一层输出的每个像素对应输入图像的14x14区域")
    print("但由于网络深度,实际有效感受野通常更小,倾向于中心区域")

visualize_receptive_field()

5. CNN架构演进

5.1 LeNet:CNN的开端

1998年,Yann LeCun提出了LeNet-5,这是第一个成功商用的CNN架构,用于手写数字识别。

import torch
import torch.nn as nn

class LeNet5(nn.Module):
    """LeNet-5: 第一个商用的CNN架构"""

    def __init__(self, num_classes=10):
        super().__init__()

        # 特征提取部分
        self.features = nn.Sequential(
            # C1: 1x32x32 -> 6x28x28
            nn.Conv2d(1, 6, kernel_size=5, padding=0),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # -> 6x14x14

            # C3: 6x14x14 -> 16x10x10
            nn.Conv2d(6, 16, kernel_size=5, padding=0),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # -> 16x5x5
        )

        # 分类器部分
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(16 * 5 * 5, 120),
            nn.ReLU(),
            nn.Linear(120, 84),
            nn.ReLU(),
            nn.Linear(84, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

# 测试
lenet = LeNet5()
x = torch.randn(1, 1, 32, 32)
output = lenet(x)
print(f"LeNet-5 输入: {x.shape} -> 输出: {output.shape}")

# 统计参数
total_params = sum(p.numel() for p in lenet.parameters())
print(f"总参数: {total_params:,}")

5.2 AlexNet:深度学习的突破

2012年,Alex Krizhevsky等人提出了AlexNet,在ImageNet竞赛中取得了突破性成绩,开启了深度学习时代。

import torch
import torch.nn as nn

class AlexNet(nn.Module):
    """AlexNet: 2012年ImageNet竞赛冠军"""

    def __init__(self, num_classes=1000):
        super().__init__()

        self.features = nn.Sequential(
            # conv1: 224x224x3 -> 55x55x96
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),  # -> 27x27x96

            # conv2: 27x27x96 -> 27x27x256
            nn.Conv2d(96, 256, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),  # -> 13x13x256

            # conv3-5: 13x13x256 -> 13x13x384 -> 13x13x384 -> 13x13x256
            nn.Conv2d(256, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),

            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),

            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),  # -> 6x6x256
        )

        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))

        self.classifier = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

alexnet = AlexNet()
x = torch.randn(1, 3, 224, 224)
output = alexnet(x)
print(f"AlexNet 输入: {x.shape} -> 输出: {output.shape}")
print(f"参数数量: {sum(p.numel() for p in alexnet.parameters()):,}")

5.3 VGGNet:简洁即美

2014年,VGGNet通过更小的卷积核(3x3)和更深的网络展示了深度的重要性。

import torch
import torch.nn as nn

class VGG16(nn.Module):
    """VGG16: 更深的网络,更小的卷积核"""

    def __init__(self, num_classes=1000):
        super().__init__()

        self.features = nn.Sequential(
            # Block 1: 224x224x3 -> 112x112x64
            nn.Conv2d(3, 64, 3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, 3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),

            # Block 2: 112x112x64 -> 56x56x128
            nn.Conv2d(64, 128, 3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, 3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),

            # Block 3: 56x56x128 -> 28x28x256
            nn.Conv2d(128, 256, 3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),

            # Block 4: 28x28x256 -> 14x14x512
            nn.Conv2d(256, 512, 3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, 3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, 3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),

            # Block 5: 14x14x512 -> 7x7x512
            nn.Conv2d(512, 512, 3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, 3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, 3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
        )

        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))

        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

vgg16 = VGG16()
x = torch.randn(1, 3, 224, 224)
output = vgg16(x)
print(f"VGG16 输入: {x.shape} -> 输出: {output.shape}")
print(f"参数数量: {sum(p.numel() for p in vgg16.parameters()):,}")
print(f"\nVGG16的关键洞察:")
print(f"1. 使用多个3x3卷积替代大卷积核 (如3个3x3替代7x7)")
print(f"2. 3个3x3感受野 = 7x7,但参数量更少 (3*9 vs 49)")
print(f"3. 引入更多非线性变换")

6. 经典CNN架构对比

import torch
import torch.nn as nn

def count_parameters(model):
    """统计模型参数"""
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# 对比经典架构
architectures = {
    'LeNet-5': LeNet5(),
    'AlexNet': AlexNet(),
    'VGG16': VGG16(),
}

print("经典CNN架构对比:")
print("=" * 70)
print(f"{'架构':<15} {'输入尺寸':<15} {'参数量':<15} {'ImageNet Top-1':<15}")
print("-" * 70)
print(f"{'LeNet-5':<15} {'32x32x1':<15} {'60K':<15} {'~70%':<15}")
print(f"{'AlexNet':<15} {'224x224x3':<15} {'60M':<15} {'62.5%':<15}")
print(f"{'VGG16':<15} {'224x224x3':<15} {'138M':<15} {'74.4%':<15}")
print("=" * 70)

print("\n架构演进的关键洞察:")
print("1. 网络越来越深 (8层 -> 11层 -> 16层)")
print("2. 卷积核越来越小 (11x11 -> 5x5, 3x3)")
print("3. 使用连续小卷积核替代大卷积核")
print("4. 引入ReLU激活函数加速训练")
print("5. 使用Dropout和数据增强防止过拟合")

7. 实战:构建现代CNN

7.1 使用残差连接

import torch
import torch.nn as nn
import torch.nn.functional as F

class ResidualBlock(nn.Module):
    """残差块:解决深层网络训练困难的问题"""

    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()

        self.conv1 = nn.Conv2d(in_channels, out_channels, 3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, 3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)

        # 如果输入输出尺寸不一致,需要用1x1卷积调整
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, 1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)  # 残差连接
        out = F.relu(out)
        return out

# 测试残差块
block = ResidualBlock(64, 128, stride=2)
x = torch.randn(1, 64, 56, 56)
out = block(x)
print(f"残差块输入: {x.shape}")
print(f"残差块输出: {out.shape}")

# 验证残差连接的作用
print("\n残差连接的作用:")
print("1. 梯度可以直接反向传播,缓解梯度消失")
print("2. 让网络更容易学习恒等映射")
print("3. 使得深层网络的训练成为可能")

7.2 完整的CNN分类器

import torch
import torch.nn as nn
import torch.nn.functional as F

class ModernCNN(nn.Module):
    """现代CNN架构示例"""

    def __init__(self, num_classes=10):
        super().__init__()

        # 初始卷积
        self.stem = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True)
        )

        # Stage 1: 224x224 -> 112x112
        self.stage1 = self._make_stage(64, 128, num_blocks=2, stride=2)

        # Stage 2: 112x112 -> 56x56
        self.stage2 = self._make_stage(128, 256, num_blocks=2, stride=2)

        # Stage 3: 56x56 -> 28x28
        self.stage3 = self._make_stage(256, 512, num_blocks=2, stride=2)

        # Stage 4: 28x28 -> 14x14
        self.stage4 = self._make_stage(512, 1024, num_blocks=2, stride=2)

        # 全局平均池化 + 分类器
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(1024, num_classes)
        )

        # 权重初始化
        self._init_weights()

    def _make_stage(self, in_channels, out_channels, num_blocks, stride):
        layers = []
        layers.append(ResidualBlock(in_channels, out_channels, stride))
        for _ in range(1, num_blocks):
            layers.append(ResidualBlock(out_channels, out_channels, 1))
        return nn.Sequential(*layers)

    def _init_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(b.bias, 0)

    def forward(self, x):
        x = self.stem(x)
        x = self.stage1(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.stage4(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

# 测试
model = ModernCNN(num_classes=10)
x = torch.randn(1, 3, 224, 224)
output = model(x)
print(f"ModernCNN 输入: {x.shape}")
print(f"ModernCNN 输出: {output.shape}")
print(f"参数数量: {count_parameters(model):,}")

# 使用torchvision的预训练模型
print("\n" + "=" * 50)
print("实际应用中,推荐使用torchvision的预训练模型:")
print("-" * 50)

import torchvision.models as models

# ResNet18
resnet18 = models.resnet18(pretrained=False)  # 设置True下载预训练权重
print(f"ResNet18 参数: {count_parameters(resnet18):,}")

# ResNet50
resnet50 = models.resnet50(pretrained=False)
print(f"ResNet50 参数: {count_parameters(resnet50):,}")

# EfficientNet_B0
efficientnet_b0 = models.efficientnet_b0(pretrained=False)
print(f"EfficientNet_B0 参数: {count_parameters(efficientnet_b0):,}")

8. CNN可视化

8.1 卷积核可视化

import torch
import torchvision.models as models
import matplotlib.pyplot as plt
import numpy as np

def visualize_conv_weights():
    """可视化卷积层权重"""

    # 加载预训练模型
    model = models.vgg16(pretrained=False)
    model.eval()

    # 获取第一个卷积层的权重
    first_conv = model.features[0]
    weights = first_conv.weight.data.cpu().numpy()

    print(f"第一层卷积权重形状: {weights.shape}")
    print(f"权重范围: [{weights.min():.3f}, {weights.max():.3f}]")

    # 可视化部分卷积核
    fig, axes = plt.subplots(4, 8, figsize=(16, 8))
    axes = axes.flatten()

    for i in range(min(32, weights.shape[0])):
        # 归一化到0-1便于显示
        w = weights[i].transpose(1, 2, 0)  # (3, 3, 3) -> (3, 3, 3)
        w = (w - w.min()) / (w.max() - w.min())

        axes[i].imshow(w)
        axes[i].axis('off')
        axes[i].set_title(f'Filter {i}')

    plt.suptitle('VGG16 第一层卷积核可视化')
    plt.tight_layout()
    plt.show()

    print("\n卷积核解读:")
    print("1. 每个卷积核学习检测一种特定的纹理模式")
    print("2. RGB三通道的卷积核可以检测彩色纹理")
    print("3. 中心权重通常较大,表示对中心区域更敏感")
    print("4. 可以看到不同卷积核学习到了不同的方向和频率")

visualize_conv_weights()

8.2 特征图可视化

import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import matplotlib.pyplot as plt

def visualize_feature_maps(image_path):
    """可视化中间层的特征图"""

    # 加载预训练模型
    model = models.vgg16(pretrained=False)
    model.eval()

    # 创建hook来获取中间特征
    features = []

    def hook(module, input, output):
        features.append(output.detach())

    # 注册hook到特定的层
    hook1 = model.features[3].register_forward_hook(hook)  # 第一层pool后
    hook2 = model.features[8].register_forward_hook(hook)  # 第二层pool后
    hook3 = model.features[17].register_forward_hook(hook)  # 第三层pool后
    hook4 = model.features[26].register_forward_hook(hook)  # 第四层pool后

    # 加载并预处理图像
    img = Image.open(image_path).convert('RGB')
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    x = transform(img).unsqueeze(0)

    # 前向传播
    with torch.no_grad():
        _ = model(x)

    # 移除hook
    hook1.remove()
    hook2.remove()
    hook3.remove()
    hook4.remove()

    # 可视化特征图
    fig, axes = plt.subplots(4, 8, figsize=(20, 10))

    layer_names = ['Block1 Pool', 'Block2 Pool', 'Block3 Pool', 'Block4 Pool']

    for layer_idx, feature in enumerate(features):
        feature = feature[0]  # batch维度
        num_channels = min(32, feature.shape[0])

        for i in range(num_channels):
            row = i // 8
            col = i % 8
            ax = axes[layer_idx, col] if layer_idx == 0 else axes[layer_idx, col]

            # 归一化特征图
            fmap = feature[i].numpy()
            fmap = (fmap - fmap.min()) / (fmap.max() - fmap.min() + 1e-8)

            ax.imshow(fmap, cmap='viridis')
            ax.axis('off')
            if col == 0:
                ax.set_ylabel(layer_names[layer_idx], fontsize=12)

    plt.suptitle('VGG16 各层特征图可视化', fontsize=14)
    plt.tight_layout()
    plt.show()

    print("\n特征图解读:")
    print("1. 浅层特征图保留更多空间信息,检测边缘和纹理")
    print("2. 深层特征图更抽象,检测语义概念")
    print("3. 不同通道关注图像的不同方面")
    print("4. 随着层数增加,空间分辨率降低,通道数增加")

# 注意:由于没有实际图像文件,这里跳过执行
# visualize_feature_maps('example.jpg')

9. 避坑小贴士

常见错误1:卷积后特征图尺寸计算错误

现象:特征图尺寸与预期不符

正确公式

output_size = floor((input_size + 2*padding - kernel_size) / stride) + 1

验证方法

# 使用torchsummary或print验证
conv = nn.Conv2d(3, 64, kernel_size=3, padding=1, stride=1)
x = torch.randn(1, 3, 224, 224)
out = conv(x)
print(f"Expected: 224, Got: {out.shape[2]}")  # 验证

# 使用adaptive可确保输出固定尺寸
adaptive_conv = nn.AdaptiveAvgPool2d((7, 7))  # 无论输入如何,输出都是7x7

常见错误2:卷积核大小选择不当

现象:模型太大或精度不高

建议

# 3x3是最常用的卷积核大小
# 多个3x3可以替代5x5、7x7,参数更少,非线性更多

# 1x1卷积用于改变通道数,不改变空间尺寸
conv_1x1 = nn.Conv2d(64, 128, 1)  # 只改变通道数

# 空洞卷积用于增大感受野而不增加参数
dilated_conv = nn.Conv2d(64, 64, 3, padding=2, dilation=2)  # 感受野=5x5

常见错误3:BatchNorm位置错误

现象:训练不稳定,loss爆炸

正确顺序

# 正确顺序: Conv -> BatchNorm -> ReLU
nn.Sequential(
    nn.Conv2d(3, 64, 3, padding=1),
    nn.BatchNorm2d(64),
    nn.ReLU(inplace=True)
)

# 或者使用预训练模型时保持原有顺序

常见错误4:忽视GPU内存管理

现象:显存不足 (OOM)

解决方案

# 1. 减少batch size
batch_size = 16  # 而不是64

# 2. 使用梯度累积
accumulation_steps = 4
for i, (images, labels) in enumerate(dataloader):
    ...
    loss = criterion(output, labels) / accumulation_steps
    loss.backward()
    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()

# 3. 使用混合精度训练
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()

for images, labels in dataloader:
    images = images.cuda()
    labels = labels.cuda()

    with autocast():
        outputs = model(images)
        loss = criterion(outputs, labels)

    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

# 4. 及时释放不需要的变量
del intermediate_variables
torch.cuda.empty_cache()

10. 本章小结

通过本章的学习,你应该已经掌握了:

  1. 卷积操作原理:理解了局部连接、权重共享和参数效率
  2. 卷积层参数:掌握了卷积核大小、步长、填充的计算方法
  3. 池化层:理解了最大池化和平均池化的作用
  4. 感受野:学会了计算各层的感受野大小
  5. 经典架构:了解了LeNet、AlexNet、VGGNet的演进
  6. 现代CNN组件:掌握了残差连接等现代组件
  7. CNN可视化:能够可视化卷积核和特征图
  8. 实践技巧:学会避免常见的CNN实践错误

一句话总结:卷积神经网络通过局部连接和权重共享高效地处理图像数据,连续的小卷积核堆叠可以替代大卷积核,而残差连接使得训练极深网络成为可能。


11. 练习与思考

  1. 卷积实现:手动实现一个多通道卷积操作,验证PyTorch结果
  2. 感受野计算:为给定的网络结构计算各层感受野
  3. 架构设计:设计一个用于医学图像分割的全卷积网络
  4. 参数优化:对比不同初始化方法对训练的影响
  5. 现代架构:分析ResNet和EfficientNet的设计理念

下一章预告:第8章《现代CNN架构ResNet与EfficientNet》将深入讲解当前最流行的CNN架构,理解残差学习和复合缩放等核心思想。


如果本章内容对你有帮助,欢迎点赞、收藏和关注。有任何问题可以在评论区留言。

Logo

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。

更多推荐