【计算机视觉实战】第7章 | CNN卷积神经网络详解:视觉识别的核心引擎
欢迎来到《计算机视觉实战》系列教程的第七章。在第六章我们学习了PyTorch基础和简单的神经网络构建,本章我们将深入学习卷积神经网络(Convolutional Neural Network,CNN),这是计算机视觉领域最核心的深度学习模型。
从2012年AlexNet在ImageNet竞赛中取得突破性成绩开始,CNN就成为了图像识别、目标检测、语义分割等视觉任务的基础架构。即使后来出现了Transformer等新架构,CNN因其高效性和对多尺度特征的良好捕捉能力,仍然是不可或缺的基础模块。
1. 环境声明
- Python版本:
Python 3.12+ - PyTorch版本:
PyTorch 2.2+ - torchvision版本:
0.17+ - NumPy版本:
1.26+ - matplotlib版本:
3.8+ - CUDA版本:
12.1+(强烈建议使用GPU)
2. 卷积操作详解
2.1 从全连接层到卷积层
在传统神经网络(如MLP)中,每个神经元与上一层的所有神经元相连。这种"全连接"的结构在处理高维图像数据时存在严重问题:
参数爆炸:一张224x224x3的RGB图像,如果第一个隐藏层有1000个神经元,就需要224x224x3x1000 ≈ 1.5亿个参数。
空间信息丢失:将图像展平成一维向量,完全丢失了像素间的空间关系。
卷积层的核心思想:
- 局部连接(Local Connectivity):每个神经元只与输入的一个局部区域相连
- 权重共享(Weight Sharing):同一个卷积核在整个输入上共享参数
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
# 对比全连接层和卷积层的参数数量
# 全连接层
fc = nn.Linear(in_features=224*224*3, out_features=1000)
fc_params = sum(p.numel() for p in fc.parameters())
print(f"全连接层参数: {fc_params:,} (约{fc_params/1e6:.1f}百万)")
# 卷积层
conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1)
conv_params = sum(p.numel() for p in conv.parameters())
print(f"卷积层参数 (64@3x3): {conv_params:,}")
# 再加几层卷积
conv2 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
total_conv_params = sum(p.numel() for p in conv.parameters()) + \
sum(p.numel() for p in conv2.parameters())
print(f"两层卷积参数: {total_conv_params:,}")
print(f"\n参数节省比例: {fc_params/total_conv_params:.0f}x")
2.2 卷积操作原理
卷积核(Kernel/Filter):一个小的矩阵(如3x3、5x5),包含要学习的权重参数。
卷积过程:
- 卷积核在输入图像上滑动(从左上角到右下角)
- 在每个位置,计算卷积核与输入局部区域的元素乘积之和
- 输出称为特征图(Feature Map)
关键参数:
- Kernel Size(卷积核大小):通常为3x3、5x5、7x7
- Stride(步长):卷积核每次移动的像素数
- Padding(填充):在输入边缘添加的像素数
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
def manual_conv2d(input_tensor, kernel, stride=1, padding=0):
"""
手动实现2D卷积
input_tensor: (C, H, W)
kernel: (C, kH, kW)
"""
C, H, W = input_tensor.shape
kH, kW = kernel.shape[1:]
# 添加padding
if padding > 0:
pad_input = torch.zeros(C, H + 2*padding, W + 2*padding)
pad_input[:, padding:padding+H, padding:padding+W] = input_tensor
else:
pad_input = input_tensor
# 计算输出大小
out_H = (H + 2*padding - kH) // stride + 1
out_W = (W + 2*padding - kW) // stride + 1
# 初始化输出
output = torch.zeros(out_H, out_W)
# 滑动计算
for i in range(out_H):
for j in range(out_W):
h_start = i * stride
w_start = j * stride
region = pad_input[:, h_start:h_start+kH, w_start:w_start+kW]
output[i, j] = (region * kernel).sum()
return output
# 创建输入和卷积核
input_tensor = torch.randn(1, 5, 5) # 单通道,5x5
kernel = torch.tensor([[1, 0, -1],
[2, 0, -2],
[1, 0, -1]], dtype=torch.float32).unsqueeze(0) # 边缘检测
# 手动计算
manual_result = manual_conv2d(input_tensor[0], kernel[0], stride=1, padding=0)
# 使用PyTorch验证
conv = nn.Conv2d(1, 1, kernel_size=3, bias=False)
conv.weight.data = kernel
output = conv(input_tensor)
print("手动计算结果:")
print(manual_result)
print("\nPyTorch卷积结果:")
print(output[0])
print(f"\n差异: {(manual_result - output[0]).abs().max():.6f}")
2.3 卷积层参数详解
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
# 不同的卷积配置
print("卷积层参数计算:")
print("=" * 50)
# 示例1:标准卷积
conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1)
params1 = sum(p.numel() for p in conv1.parameters())
print(f"Conv(3->64, 3x3, pad=1)")
print(f" 参数数量: {params1:,}")
print(f" 计算: 3x3x3 + 64 = {3*3*3 + 64}")
# 示例2:无padding的卷积
conv2 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=0)
params2 = sum(p.numel() for p in conv2.parameters())
print(f"\nConv(3->64, 3x3, pad=0)")
print(f" 参数数量: {params2:,}")
print(f" 输出尺寸: (H-2) x (W-2)")
# 示例3:大卷积核
conv3 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, padding=3)
params3 = sum(p.numel() for p in conv3.parameters())
print(f"\nConv(3->64, 7x7, pad=3)")
print(f" 参数数量: {params3:,}")
print(f" 计算: 7x7x3 + 64 = {7*7*3 + 64}")
# 示例4:空洞卷积
conv4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding=2, dilation=2)
params4 = sum(p.numel() for p in conv4.parameters())
print(f"\nConv(64->64, 3x3, pad=2, dilation=2)")
print(f" 参数数量: {params4:,}")
print(f" 感受野: 5x5 (等效于5x5卷积核)")
# 验证卷积输出尺寸公式
print("\n" + "=" * 50)
print("卷积输出尺寸公式:")
print("output_size = floor((input_size + 2*padding - kernel_size) / stride) + 1")
# 实际测试
test_input = torch.randn(1, 3, 32, 32)
print(f"\n输入: {test_input.shape}")
test_conv = nn.Conv2d(3, 64, kernel_size=3, padding=1, stride=1)
output = test_conv(test_input)
print(f"Conv(3->64, 3x3, pad=1, stride=1): {output.shape}")
test_conv_stride2 = nn.Conv2d(3, 64, kernel_size=3, padding=1, stride=2)
output_stride2 = test_conv_stride2(test_input)
print(f"Conv(3->64, 3x3, pad=1, stride=2): {output_stride2.shape}")
2.4 多通道卷积
import torch
import torch.nn as nn
# 多通道输入,多通道输出
# 输入: (batch, C_in, H, W)
# 输出: (batch, C_out, H', W')
input_tensor = torch.randn(1, 3, 8, 8) # 3通道输入
print(f"输入形状: {input_tensor.shape}")
# 创建多通道卷积层
conv = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3, padding=1)
output = conv(input_tensor)
print(f"输出形状: {output.shape}")
# 查看权重形状
print(f"\n卷积权重形状: {conv.weight.shape}") # [8, 3, 3, 3] - 8个输出通道,每个需要3个输入通道的权重
print(f"偏置形状: {conv.bias.shape}") # [8] - 每个输出通道一个偏置
# 每个输出通道是所有输入通道的卷积结果之和
print("\n每个输出通道 = Σ(输入通道_i × 卷积核_i) + 偏置")
print(f"因此参数数量 = C_out × C_in × K × K + C_out")
print(f" = 8 × 3 × 3 × 3 + 8 = {8*3*3*3 + 8}")
3. 池化层
3.1 池化操作原理
池化层(Pooling Layer)是CNN中另一个重要组件,它的主要作用是:
- 降低特征图尺寸:减少计算量和参数量
- 增加感受野:让后续卷积能看到更大的图像区域
- 提供平移不变性:轻微的平移不会改变池化结果
最大池化(Max Pooling):保留区域内的最大值
平均池化(Average Pooling):保留区域内的平均值
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
# 最大池化
maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
# 平均池化
avgpool = nn.AvgPool2d(kernel_size=2, stride=2)
# 创建测试输入
input_tensor = torch.arange(1, 17, dtype=torch.float32).view(1, 1, 4, 4)
print("输入特征图:")
print(input_tensor[0, 0])
# 池化操作
output_max = maxpool(input_tensor)
output_avg = avgpool(input_tensor)
print("\n最大池化 (2x2, stride=2):")
print(output_max[0, 0])
print("\n平均池化 (2x2, stride=2):")
print(output_avg[0, 0])
# 验证输出尺寸
print("\n池化输出尺寸公式:")
print("output_size = floor(input_size / stride)")
print(f"输入: 4x4, 输出: {output_max.shape[2]}x{output_max.shape[3]}")
# 全局平均池化
gap = nn.AdaptiveAvgPool2d((1, 1))
global_pool = gap(input_tensor)
print(f"\n全局平均池化: {input_tensor.shape} -> {global_pool.shape}")
3.2 池化vs步长卷积
import torch
import torch.nn as nn
# 两种方式都可以降低分辨率
# 方式1:池化 + 卷积
pool_conv = nn.Sequential(
nn.MaxPool2d(kernel_size=2, stride=2), # 32x32 -> 16x16
nn.Conv2d(3, 64, kernel_size=3, padding=1) # 保持16x16
)
# 方式2:步长卷积
stride_conv = nn.Conv2d(3, 64, kernel_size=3, padding=1, stride=2) # 32x32 -> 16x16
# 对比
x = torch.randn(1, 3, 32, 32)
out1 = pool_conv(x)
out2 = stride_conv(x)
print(f"输入: {x.shape}")
print(f"池化+卷积: {out1.shape}")
print(f"步长卷积: {out2.shape}")
# 哪种更好?
print("\n对比:")
print("池化+卷积: 更早降采样,参数更多,计算更慢")
print("步长卷积: 更晚降采样,参数更少,计算更快(现代网络更常用)")
4. 感受野
4.1 感受野的概念
**感受野(Receptive Field)**是指输出特征图上的一个像素对应输入图像的区域大小。理解感受野对于设计网络和调试模型至关重要。
import torch
import torch.nn as nn
def calculate_receptive_field(layers_info):
"""
计算感受野
layers_info: list of tuples (kernel_size, stride)
"""
rf = 1 # 初始感受野
stride_accum = 1
print("逐层感受野计算:")
print("-" * 50)
for i, (ks, s) in enumerate(layers_info):
new_rf = rf + (ks - 1) * stride_accum
print(f"Layer {i+1}: kernel={ks}x{ks}, stride={s}")
print(f" RF_before={rf}, stride_accum={stride_accum}")
print(f" RF_after = {rf} + ({ks}-1) * {stride_accum} = {new_rf}")
rf = new_rf
stride_accum *= s
print("-" * 50)
return rf
# 示例:两层3x3卷积
print("两层3x3卷积 (stride=1):")
rf1 = calculate_receptive_field([(3, 1), (3, 1)])
print(f"最终感受野: {rf1}x{rf1}")
print("注意:两层3x3卷积的感受野等于一个5x5卷积!")
print("\n" + "=" * 50)
# 示例:VGG风格网络
print("\nVGG风格网络:")
rf_vgg = calculate_receptive_field([(3, 1), (3, 1), (3, 1)]) # conv1
print(f"第一个block后: {rf_vgg}x{rf_vgg}")
rf_vgg2 = calculate_receptive_field([(3, 1), (3, 1), (3, 1),
(3, 2), (3, 1), (3, 1)]) # + pool
print(f"第二个block后: {rf_vgg2}x{rf_vgg2}")
4.2 有效感受野
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
# 可视化感受野
def visualize_receptive_field():
"""可视化CNN各层的有效感受野"""
# 创建一个简单的CNN
model = nn.Sequential(
nn.Conv2d(3, 32, 3, padding=1), # conv1
nn.ReLU(),
nn.Conv2d(32, 64, 3, padding=1), # conv2
nn.ReLU(),
nn.MaxPool2d(2, 2), # pool1
nn.Conv2d(64, 128, 3, padding=1), # conv3
nn.ReLU(),
nn.Conv2d(128, 128, 3, padding=1), # conv4
nn.ReLU(),
nn.AdaptiveAvgPool2d(1) # GAP
)
# 打印网络结构
print("网络结构:")
print("=" * 50)
for i, layer in enumerate(model):
if isinstance(layer, nn.Conv2d):
print(f"Layer {i}: Conv2d({layer.in_channels}->{layer.out_channels}, "
f"kernel={layer.kernel_size[0]}, stride={layer.stride[0]}, "
f"padding={layer.padding[0]})")
elif isinstance(layer, nn.MaxPool2d):
print(f"Layer {i}: MaxPool2d(kernel={layer.kernel_size}, stride={layer.stride})")
elif isinstance(layer, nn.ReLU):
print(f"Layer {i}: ReLU")
elif isinstance(layer, nn.AdaptiveAvgPool2d):
print(f"Layer {i}: AdaptiveAvgPool2d(1)")
print("\n感受野计算:")
print("=" * 50)
print("conv1 (3x3, s=1): RF = 3")
print("conv2 (3x3, s=1): RF = 3 + (3-1)*1 = 5")
print("pool1 (2x2, s=2): RF = 5 + (2-1)*1*1 = 6 (但stride累积)")
print(" 实际上 stride=2, 所以输入范围 = 6")
print("conv3 (3x3, s=1): RF = 6 + (3-1)*2 = 10")
print("conv4 (3x3, s=1): RF = 10 + (3-1)*2 = 14")
print("GAP: 整个特征图都被平均池化")
print("\n结论:")
print("最后一层输出的每个像素对应输入图像的14x14区域")
print("但由于网络深度,实际有效感受野通常更小,倾向于中心区域")
visualize_receptive_field()
5. CNN架构演进
5.1 LeNet:CNN的开端
1998年,Yann LeCun提出了LeNet-5,这是第一个成功商用的CNN架构,用于手写数字识别。
import torch
import torch.nn as nn
class LeNet5(nn.Module):
"""LeNet-5: 第一个商用的CNN架构"""
def __init__(self, num_classes=10):
super().__init__()
# 特征提取部分
self.features = nn.Sequential(
# C1: 1x32x32 -> 6x28x28
nn.Conv2d(1, 6, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(2, 2), # -> 6x14x14
# C3: 6x14x14 -> 16x10x10
nn.Conv2d(6, 16, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(2, 2), # -> 16x5x5
)
# 分类器部分
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(16 * 5 * 5, 120),
nn.ReLU(),
nn.Linear(120, 84),
nn.ReLU(),
nn.Linear(84, num_classes)
)
def forward(self, x):
x = self.features(x)
x = self.classifier(x)
return x
# 测试
lenet = LeNet5()
x = torch.randn(1, 1, 32, 32)
output = lenet(x)
print(f"LeNet-5 输入: {x.shape} -> 输出: {output.shape}")
# 统计参数
total_params = sum(p.numel() for p in lenet.parameters())
print(f"总参数: {total_params:,}")
5.2 AlexNet:深度学习的突破
2012年,Alex Krizhevsky等人提出了AlexNet,在ImageNet竞赛中取得了突破性成绩,开启了深度学习时代。
import torch
import torch.nn as nn
class AlexNet(nn.Module):
"""AlexNet: 2012年ImageNet竞赛冠军"""
def __init__(self, num_classes=1000):
super().__init__()
self.features = nn.Sequential(
# conv1: 224x224x3 -> 55x55x96
nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), # -> 27x27x96
# conv2: 27x27x96 -> 27x27x256
nn.Conv2d(96, 256, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), # -> 13x13x256
# conv3-5: 13x13x256 -> 13x13x384 -> 13x13x384 -> 13x13x256
nn.Conv2d(256, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), # -> 6x6x256
)
self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
alexnet = AlexNet()
x = torch.randn(1, 3, 224, 224)
output = alexnet(x)
print(f"AlexNet 输入: {x.shape} -> 输出: {output.shape}")
print(f"参数数量: {sum(p.numel() for p in alexnet.parameters()):,}")
5.3 VGGNet:简洁即美
2014年,VGGNet通过更小的卷积核(3x3)和更深的网络展示了深度的重要性。
import torch
import torch.nn as nn
class VGG16(nn.Module):
"""VGG16: 更深的网络,更小的卷积核"""
def __init__(self, num_classes=1000):
super().__init__()
self.features = nn.Sequential(
# Block 1: 224x224x3 -> 112x112x64
nn.Conv2d(3, 64, 3, padding=1), nn.ReLU(inplace=True),
nn.Conv2d(64, 64, 3, padding=1), nn.ReLU(inplace=True),
nn.MaxPool2d(2, 2),
# Block 2: 112x112x64 -> 56x56x128
nn.Conv2d(64, 128, 3, padding=1), nn.ReLU(inplace=True),
nn.Conv2d(128, 128, 3, padding=1), nn.ReLU(inplace=True),
nn.MaxPool2d(2, 2),
# Block 3: 56x56x128 -> 28x28x256
nn.Conv2d(128, 256, 3, padding=1), nn.ReLU(inplace=True),
nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(inplace=True),
nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(inplace=True),
nn.MaxPool2d(2, 2),
# Block 4: 28x28x256 -> 14x14x512
nn.Conv2d(256, 512, 3, padding=1), nn.ReLU(inplace=True),
nn.Conv2d(512, 512, 3, padding=1), nn.ReLU(inplace=True),
nn.Conv2d(512, 512, 3, padding=1), nn.ReLU(inplace=True),
nn.MaxPool2d(2, 2),
# Block 5: 14x14x512 -> 7x7x512
nn.Conv2d(512, 512, 3, padding=1), nn.ReLU(inplace=True),
nn.Conv2d(512, 512, 3, padding=1), nn.ReLU(inplace=True),
nn.Conv2d(512, 512, 3, padding=1), nn.ReLU(inplace=True),
nn.MaxPool2d(2, 2),
)
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
vgg16 = VGG16()
x = torch.randn(1, 3, 224, 224)
output = vgg16(x)
print(f"VGG16 输入: {x.shape} -> 输出: {output.shape}")
print(f"参数数量: {sum(p.numel() for p in vgg16.parameters()):,}")
print(f"\nVGG16的关键洞察:")
print(f"1. 使用多个3x3卷积替代大卷积核 (如3个3x3替代7x7)")
print(f"2. 3个3x3感受野 = 7x7,但参数量更少 (3*9 vs 49)")
print(f"3. 引入更多非线性变换")
6. 经典CNN架构对比
import torch
import torch.nn as nn
def count_parameters(model):
"""统计模型参数"""
return sum(p.numel() for p in model.parameters() if p.requires_grad)
# 对比经典架构
architectures = {
'LeNet-5': LeNet5(),
'AlexNet': AlexNet(),
'VGG16': VGG16(),
}
print("经典CNN架构对比:")
print("=" * 70)
print(f"{'架构':<15} {'输入尺寸':<15} {'参数量':<15} {'ImageNet Top-1':<15}")
print("-" * 70)
print(f"{'LeNet-5':<15} {'32x32x1':<15} {'60K':<15} {'~70%':<15}")
print(f"{'AlexNet':<15} {'224x224x3':<15} {'60M':<15} {'62.5%':<15}")
print(f"{'VGG16':<15} {'224x224x3':<15} {'138M':<15} {'74.4%':<15}")
print("=" * 70)
print("\n架构演进的关键洞察:")
print("1. 网络越来越深 (8层 -> 11层 -> 16层)")
print("2. 卷积核越来越小 (11x11 -> 5x5, 3x3)")
print("3. 使用连续小卷积核替代大卷积核")
print("4. 引入ReLU激活函数加速训练")
print("5. 使用Dropout和数据增强防止过拟合")
7. 实战:构建现代CNN
7.1 使用残差连接
import torch
import torch.nn as nn
import torch.nn.functional as F
class ResidualBlock(nn.Module):
"""残差块:解决深层网络训练困难的问题"""
def __init__(self, in_channels, out_channels, stride=1):
super().__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, 3, stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, 3, stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
# 如果输入输出尺寸不一致,需要用1x1卷积调整
self.shortcut = nn.Sequential()
if stride != 1 or in_channels != out_channels:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, out_channels, 1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels)
)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
out += self.shortcut(x) # 残差连接
out = F.relu(out)
return out
# 测试残差块
block = ResidualBlock(64, 128, stride=2)
x = torch.randn(1, 64, 56, 56)
out = block(x)
print(f"残差块输入: {x.shape}")
print(f"残差块输出: {out.shape}")
# 验证残差连接的作用
print("\n残差连接的作用:")
print("1. 梯度可以直接反向传播,缓解梯度消失")
print("2. 让网络更容易学习恒等映射")
print("3. 使得深层网络的训练成为可能")
7.2 完整的CNN分类器
import torch
import torch.nn as nn
import torch.nn.functional as F
class ModernCNN(nn.Module):
"""现代CNN架构示例"""
def __init__(self, num_classes=10):
super().__init__()
# 初始卷积
self.stem = nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True)
)
# Stage 1: 224x224 -> 112x112
self.stage1 = self._make_stage(64, 128, num_blocks=2, stride=2)
# Stage 2: 112x112 -> 56x56
self.stage2 = self._make_stage(128, 256, num_blocks=2, stride=2)
# Stage 3: 56x56 -> 28x28
self.stage3 = self._make_stage(256, 512, num_blocks=2, stride=2)
# Stage 4: 28x28 -> 14x14
self.stage4 = self._make_stage(512, 1024, num_blocks=2, stride=2)
# 全局平均池化 + 分类器
self.avgpool = nn.AdaptiveAvgPool2d(1)
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(1024, num_classes)
)
# 权重初始化
self._init_weights()
def _make_stage(self, in_channels, out_channels, num_blocks, stride):
layers = []
layers.append(ResidualBlock(in_channels, out_channels, stride))
for _ in range(1, num_blocks):
layers.append(ResidualBlock(out_channels, out_channels, 1))
return nn.Sequential(*layers)
def _init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(b.bias, 0)
def forward(self, x):
x = self.stem(x)
x = self.stage1(x)
x = self.stage2(x)
x = self.stage3(x)
x = self.stage4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
# 测试
model = ModernCNN(num_classes=10)
x = torch.randn(1, 3, 224, 224)
output = model(x)
print(f"ModernCNN 输入: {x.shape}")
print(f"ModernCNN 输出: {output.shape}")
print(f"参数数量: {count_parameters(model):,}")
# 使用torchvision的预训练模型
print("\n" + "=" * 50)
print("实际应用中,推荐使用torchvision的预训练模型:")
print("-" * 50)
import torchvision.models as models
# ResNet18
resnet18 = models.resnet18(pretrained=False) # 设置True下载预训练权重
print(f"ResNet18 参数: {count_parameters(resnet18):,}")
# ResNet50
resnet50 = models.resnet50(pretrained=False)
print(f"ResNet50 参数: {count_parameters(resnet50):,}")
# EfficientNet_B0
efficientnet_b0 = models.efficientnet_b0(pretrained=False)
print(f"EfficientNet_B0 参数: {count_parameters(efficientnet_b0):,}")
8. CNN可视化
8.1 卷积核可视化
import torch
import torchvision.models as models
import matplotlib.pyplot as plt
import numpy as np
def visualize_conv_weights():
"""可视化卷积层权重"""
# 加载预训练模型
model = models.vgg16(pretrained=False)
model.eval()
# 获取第一个卷积层的权重
first_conv = model.features[0]
weights = first_conv.weight.data.cpu().numpy()
print(f"第一层卷积权重形状: {weights.shape}")
print(f"权重范围: [{weights.min():.3f}, {weights.max():.3f}]")
# 可视化部分卷积核
fig, axes = plt.subplots(4, 8, figsize=(16, 8))
axes = axes.flatten()
for i in range(min(32, weights.shape[0])):
# 归一化到0-1便于显示
w = weights[i].transpose(1, 2, 0) # (3, 3, 3) -> (3, 3, 3)
w = (w - w.min()) / (w.max() - w.min())
axes[i].imshow(w)
axes[i].axis('off')
axes[i].set_title(f'Filter {i}')
plt.suptitle('VGG16 第一层卷积核可视化')
plt.tight_layout()
plt.show()
print("\n卷积核解读:")
print("1. 每个卷积核学习检测一种特定的纹理模式")
print("2. RGB三通道的卷积核可以检测彩色纹理")
print("3. 中心权重通常较大,表示对中心区域更敏感")
print("4. 可以看到不同卷积核学习到了不同的方向和频率")
visualize_conv_weights()
8.2 特征图可视化
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import matplotlib.pyplot as plt
def visualize_feature_maps(image_path):
"""可视化中间层的特征图"""
# 加载预训练模型
model = models.vgg16(pretrained=False)
model.eval()
# 创建hook来获取中间特征
features = []
def hook(module, input, output):
features.append(output.detach())
# 注册hook到特定的层
hook1 = model.features[3].register_forward_hook(hook) # 第一层pool后
hook2 = model.features[8].register_forward_hook(hook) # 第二层pool后
hook3 = model.features[17].register_forward_hook(hook) # 第三层pool后
hook4 = model.features[26].register_forward_hook(hook) # 第四层pool后
# 加载并预处理图像
img = Image.open(image_path).convert('RGB')
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
x = transform(img).unsqueeze(0)
# 前向传播
with torch.no_grad():
_ = model(x)
# 移除hook
hook1.remove()
hook2.remove()
hook3.remove()
hook4.remove()
# 可视化特征图
fig, axes = plt.subplots(4, 8, figsize=(20, 10))
layer_names = ['Block1 Pool', 'Block2 Pool', 'Block3 Pool', 'Block4 Pool']
for layer_idx, feature in enumerate(features):
feature = feature[0] # batch维度
num_channels = min(32, feature.shape[0])
for i in range(num_channels):
row = i // 8
col = i % 8
ax = axes[layer_idx, col] if layer_idx == 0 else axes[layer_idx, col]
# 归一化特征图
fmap = feature[i].numpy()
fmap = (fmap - fmap.min()) / (fmap.max() - fmap.min() + 1e-8)
ax.imshow(fmap, cmap='viridis')
ax.axis('off')
if col == 0:
ax.set_ylabel(layer_names[layer_idx], fontsize=12)
plt.suptitle('VGG16 各层特征图可视化', fontsize=14)
plt.tight_layout()
plt.show()
print("\n特征图解读:")
print("1. 浅层特征图保留更多空间信息,检测边缘和纹理")
print("2. 深层特征图更抽象,检测语义概念")
print("3. 不同通道关注图像的不同方面")
print("4. 随着层数增加,空间分辨率降低,通道数增加")
# 注意:由于没有实际图像文件,这里跳过执行
# visualize_feature_maps('example.jpg')
9. 避坑小贴士
常见错误1:卷积后特征图尺寸计算错误
现象:特征图尺寸与预期不符
正确公式:
output_size = floor((input_size + 2*padding - kernel_size) / stride) + 1
验证方法:
# 使用torchsummary或print验证
conv = nn.Conv2d(3, 64, kernel_size=3, padding=1, stride=1)
x = torch.randn(1, 3, 224, 224)
out = conv(x)
print(f"Expected: 224, Got: {out.shape[2]}") # 验证
# 使用adaptive可确保输出固定尺寸
adaptive_conv = nn.AdaptiveAvgPool2d((7, 7)) # 无论输入如何,输出都是7x7
常见错误2:卷积核大小选择不当
现象:模型太大或精度不高
建议:
# 3x3是最常用的卷积核大小
# 多个3x3可以替代5x5、7x7,参数更少,非线性更多
# 1x1卷积用于改变通道数,不改变空间尺寸
conv_1x1 = nn.Conv2d(64, 128, 1) # 只改变通道数
# 空洞卷积用于增大感受野而不增加参数
dilated_conv = nn.Conv2d(64, 64, 3, padding=2, dilation=2) # 感受野=5x5
常见错误3:BatchNorm位置错误
现象:训练不稳定,loss爆炸
正确顺序:
# 正确顺序: Conv -> BatchNorm -> ReLU
nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True)
)
# 或者使用预训练模型时保持原有顺序
常见错误4:忽视GPU内存管理
现象:显存不足 (OOM)
解决方案:
# 1. 减少batch size
batch_size = 16 # 而不是64
# 2. 使用梯度累积
accumulation_steps = 4
for i, (images, labels) in enumerate(dataloader):
...
loss = criterion(output, labels) / accumulation_steps
loss.backward()
if (i + 1) % accumulation_steps == 0:
optimizer.step()
optimizer.zero_grad()
# 3. 使用混合精度训练
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for images, labels in dataloader:
images = images.cuda()
labels = labels.cuda()
with autocast():
outputs = model(images)
loss = criterion(outputs, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
# 4. 及时释放不需要的变量
del intermediate_variables
torch.cuda.empty_cache()
10. 本章小结
通过本章的学习,你应该已经掌握了:
- 卷积操作原理:理解了局部连接、权重共享和参数效率
- 卷积层参数:掌握了卷积核大小、步长、填充的计算方法
- 池化层:理解了最大池化和平均池化的作用
- 感受野:学会了计算各层的感受野大小
- 经典架构:了解了LeNet、AlexNet、VGGNet的演进
- 现代CNN组件:掌握了残差连接等现代组件
- CNN可视化:能够可视化卷积核和特征图
- 实践技巧:学会避免常见的CNN实践错误
一句话总结:卷积神经网络通过局部连接和权重共享高效地处理图像数据,连续的小卷积核堆叠可以替代大卷积核,而残差连接使得训练极深网络成为可能。
11. 练习与思考
- 卷积实现:手动实现一个多通道卷积操作,验证PyTorch结果
- 感受野计算:为给定的网络结构计算各层感受野
- 架构设计:设计一个用于医学图像分割的全卷积网络
- 参数优化:对比不同初始化方法对训练的影响
- 现代架构:分析ResNet和EfficientNet的设计理念
下一章预告:第8章《现代CNN架构ResNet与EfficientNet》将深入讲解当前最流行的CNN架构,理解残差学习和复合缩放等核心思想。
如果本章内容对你有帮助,欢迎点赞、收藏和关注。有任何问题可以在评论区留言。
AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。
更多推荐

所有评论(0)