深度学习计算：打开工具箱，从“基础用户“升级为“高级用户“

2401_83600008

538人浏览 · 2026-05-25 08:00:00

2401_83600008 · 2026-05-25 08:00:00 发布

深度学习的"工具箱"：层、块、参数与GPU

前几篇我们学会了怎么用现成的积木搭网络，但你有没有想过：这些积木是怎么造出来的？怎么自己造积木？怎么保存和加载模型？怎么用GPU加速？

今天，我们就打开深度学习的"工具箱"，从"基础用户"升级为"高级用户"！我们会用组装电脑的比喻来讲解——你把深度学习库想象成电脑城，层是硬件，块是组装好的主机，参数是硬件的配置，GPU是高性能显卡！

一、层和块：从零件到整机

1. 什么是层和块？

层（Layer）：就像电脑的硬件零件（CPU、显卡、内存）
块（Block）：就像把零件组装好的整机（主机）

通俗理解：

层是单个零件，功能单一
块是多个零件组装在一起，功能更完整

在PyTorch里，nn.Module是所有层和块的基类——就像所有硬件都得符合某个标准接口。

2. 自定义块：自己组装一台电脑！

我们来自定义一个MLP块，就像自己组装一台电脑：

import torch
from torch import nn
from torch.nn import functional as F

# 自定义MLP块（就像自己组装一台电脑）
class MLP(nn.Module):
    def __init__(self):
        super().__init__()  # 必须调用父类的构造函数
        self.hidden = nn.Linear(20, 256)  # 隐藏层（CPU）
        self.out = nn.Linear(256, 10)     # 输出层（显卡）
    
    def forward(self, X):
        # 前向传播：数据怎么流动（就像数据在电脑里怎么传输）
        return self.out(F.relu(self.hidden(X)))

# 测试一下
net = MLP()
X = torch.randn(2, 20)  # 输入数据
print(net(X))

自己定义块就是这么简单！只需要：

继承nn.Module
在__init__里定义层
实现forward函数（前向传播）

3. 顺序块：用流水线组装！

如果只是简单地把层串起来，PyTorch给我们提供了nn.Sequential——就像流水线组装电脑：

import torch
from torch import nn
from torch.nn import functional as F

# 用Sequential定义MLP（流水线组装）
net = nn.Sequential(
    nn.Linear(20, 256),  # 第一步：装CPU
    nn.ReLU(),            # 第二步：装散热
    nn.Linear(256, 10)     # 第三步：装显卡
)

# 测试一下
X = torch.randn(2, 20)  # 输入数据
print(net(X))

Sequential的好处是：简单、直观，适合层与层之间是顺序连接的情况。

4. 在前向传播里执行代码：灵活组装！

forward函数里不仅能调用层，还能执行任意Python代码！就像你组装电脑时可以灵活调整：

class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.rand_weight = torch.rand((20, 20), requires_grad=False)  # 这个权重不训练
        self.linear = nn.Linear(20, 20)
    
    def forward(self, X):
        X = self.linear(X)
        X = F.relu(torch.mm(X, self.rand_weight) + 1)  # 用常量计算
        X = self.linear(X)  # 重用同一个层
        while X.abs().sum() > 1:
            X /= 2
        return X.sum()

net = FixedHiddenMLP()
print(net(X))

看！forward里不仅能用层，还能用循环、条件判断，甚至不用梯度的常量！这就是深度学习框架的强大之处——灵活！

5. 嵌套块：电脑里可以装服务器！

块可以嵌套块！就像电脑里可以再装一台服务器：

class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(20, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU()
        )
        self.linear = nn.Linear(32, 16)
    
    def forward(self, X):
        return self.linear(self.net(X))

# 超级嵌套：块里套块，再套块
chimera = nn.Sequential(
    NestMLP(), 
    nn.Linear(16, 20), 
    FixedHiddenMLP()
)

print(chimera(X))

嵌套块让我们可以模块化地构建网络——复杂的网络也是由简单的块组成的！

二、参数管理：查看和调整硬件配置！

1. 参数访问：看看硬件配置！

模型训练后，我们需要查看参数——就像看看电脑的硬件配置：

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)

# 访问第二个层（输出层）的参数
print(net[2].state_dict())

每个层的参数都在state_dict里——就像硬件的配置清单。

访问特定参数：

# 访问输出层的权重
print(type(net[2].weight))
print(net[2].weight)
print(net[2].weight.data)  # 只看数值，不看梯度

# 访问偏置
print(net[2].bias)
print(net[2].bias.data)

# 访问梯度（如果还没反向传播，梯度是None）
print(net[2].weight.grad == None)

一次性访问所有参数：

# 访问所有参数
print(*[(name, param.shape) for name, param in net.named_parameters()])

# 或者直接访问
print(net.state_dict()['2.bias'].data)

2. 从嵌套块里收集参数：拆开服务器看配置！

嵌套块的参数怎么访问？递归地找就行了：

import torch
from torch import nn
from torch.nn import functional as F

# 重新定义一下需要的类（方便独立运行这个代码块）
class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.rand_weight = torch.rand((20, 20), requires_grad=False)
        self.linear = nn.Linear(20, 20)
    
    def forward(self, X):
        X = self.linear(X)
        X = F.relu(torch.mm(X, self.rand_weight) + 1)
        X = self.linear(X)
        while X.abs().sum() > 1:
            X /= 2
        return X.sum()

class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(20, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU()
        )
        self.linear = nn.Linear(32, 16)
    
    def forward(self, X):
        return self.linear(self.net(X))

# 创建chimera网络
chimera = nn.Sequential(
    NestMLP(), 
    nn.Linear(16, 20), 
    FixedHiddenMLP()
)

# 从嵌套块里收集参数
print(*[(name, param.shape) for name, param in chimera.named_parameters()])

不管嵌套多少层，named_parameters()都能把所有参数找出来！

3. 参数初始化：给硬件设置默认值！

好的参数初始化很重要——就像给硬件设置合适的默认值。

默认初始化：

PyTorch有默认的初始化方式：

线性层的权重：均匀分布或正态分布
偏置：初始化为0

内置初始化：

PyTorch也提供了内置的初始化方法：

import torch
from torch import nn

# 先创建一个网络
net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)  # 先做一次前向传播，确保参数初始化

# 正态分布初始化
def init_normal(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, mean=0, std=0.01)
        nn.init.zeros_(m.bias)

net.apply(init_normal)  # apply会把init_normal应用到每一层
print(net[0].weight.data[0], net[0].bias.data[0])

常数初始化：

import torch
from torch import nn

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)

def init_constant(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight, 1)
        nn.init.zeros_(m.bias)

net.apply(init_constant)
print(net[0].weight.data[0], net[0].bias.data[0])

自定义初始化：

你也可以自己写初始化逻辑：

import torch
from torch import nn

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)

def my_init(m):
    if type(m) == nn.Linear:
        print("Init", *[(name, param.shape)
                        for name, param in m.named_parameters()][0])
        nn.init.uniform_(m.weight, -10, 10)
        # 自定义：绝对值>=5的权重保留，否则设为0
        m.weight.data *= (m.weight.data.abs() >= 5).float()

net.apply(my_init)
print(net[0].weight[:2])

直接设置参数：

你甚至可以直接修改参数的值：

import torch
from torch import nn

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)

net[0].weight.data[:] += 1  # 所有权重加1
net[0].weight.data[0, 0] = 42  # 第一个权重设为42
print(net[0].weight.data[0])

4. 参数绑定：两台电脑用同一个显卡！

有时我们想让多个层共享参数——就像两台电脑用同一个显卡：

import torch
from torch import nn

# 共享层
shared = nn.Linear(8, 8)
net = nn.Sequential(
    nn.Linear(4, 8), 
    nn.ReLU(),
    shared,           # 第一次用shared
    nn.ReLU(),
    shared,           # 第二次用shared（同一个对象！）
    nn.ReLU(),
    nn.Linear(8, 1)
)

X = torch.rand(size=(2, 4))
net(X)
# 检查它们是不是一样的
print(net[2].weight.data[0] == net[4].weight.data[0])

# 修改一个，另一个也会变
net[2].weight.data[0, 0] = 100
print(net[2].weight.data[0] == net[4].weight.data[0])

参数绑定可以节省内存，也能让模型在不同位置共享权重！

三、延后初始化：先装机，再看需要什么配置！

1. 什么是延后初始化？

你有没有遇到过这种情况：定义网络时不知道输入维度？

import torch
from torch import nn

net = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))
print(net[0].weight)  # 还没初始化，会显示UninitializedParameter

LazyLinear就是延后初始化——它不知道输入维度，所以不初始化参数。

2. 第一次前向传播时才初始化！

当你第一次传入数据时，PyTorch会自动推断输入维度，然后初始化参数：

import torch
from torch import nn

net = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))

X = torch.rand(2, 20)
net(X)  # 第一次前向传播，现在初始化了！
print(net[0].weight.shape)  # (256, 20)

延后初始化的好处是：你不需要手动计算每一层的输入维度！

四、自定义层：自己造硬件！

1. 不带参数的层：造一个简单零件！

我们来造一个没有参数的层——就像造一个简单的转接头：

import torch
from torch import nn
from torch.nn import functional as F

class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()
    
    def forward(self, X):
        return X - X.mean()  # 减去均值，让数据中心化

layer = CenteredLayer()
print(layer(torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0])))

不带参数的层就是这么简单——只需要实现forward！

我们把这个层放到网络里试试：

import torch
from torch import nn

class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()
    
    def forward(self, X):
        return X - X.mean()

net = nn.Sequential(nn.Linear(8, 128), CenteredLayer())
Y = net(torch.rand(4, 8))
print(Y.mean())  # 应该接近0

2. 带参数的层：造一个带开关的零件！

我们来造一个带参数的层——就像造一个带开关的零件：

import torch
from torch import nn
from torch.nn import functional as F

class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))  # 权重参数
        self.bias = nn.Parameter(torch.randn(units,))           # 偏置参数
    
    def forward(self, X):
        # 注意：前向传播中必须直接使用 self.weight 和 self.bias
        # 千万不要写成 self.weight.data，否则会切断计算图，导致无法反向传播计算梯度！
        linear = torch.matmul(X, self.weight) + self.bias
        return F.relu(linear)

# 测试一下
linear = MyLinear(5, 3)
print(linear.weight)

# 前向传播
print(linear(torch.rand(2, 5)))

# 放到Sequential里
net = nn.Sequential(MyLinear(64, 8), MyLinear(8, 1))
print(net(torch.rand(2, 64)))

带参数的层需要：

在__init__里用nn.Parameter定义参数
在forward里用这些参数计算

五、读写文件：保存和加载你的电脑！

1. 加载和保存张量：保存一个硬件！

先从简单的开始——保存和加载张量：

import torch

# 保存张量
x = torch.tensor([3.0])
torch.save(x, 'x-file')

# 加载张量
x2 = torch.load('x-file')
print(x2)

保存和加载一个张量列表：

import torch

x = torch.tensor([3.0])
y = torch.tensor([4.0])
torch.save([x, y], 'x-files')
x2, y2 = torch.load('x-files')
print(x2, y2)

保存和加载一个字典：

import torch

x = torch.tensor([3.0])
y = torch.tensor([4.0])
mydict = {'x': x, 'y': y}
torch.save(mydict, 'mydict')
mydict2 = torch.load('mydict')
print(mydict2)

2. 加载和保存模型参数：保存你的整机配置！

保存整个模型的参数——就像保存你电脑的整机配置：

import torch
from torch import nn
from torch.nn import functional as F

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.output = nn.Linear(256, 10)
    
    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))

# 创建网络并前向传播
net = MLP()
X = torch.randn(size=(2, 20))
Y = net(X)

# 保存模型参数
torch.save(net.state_dict(), 'mlp.params')

加载模型参数——就像用配置文件组装一台一样的电脑：

import torch
from torch import nn
from torch.nn import functional as F

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.output = nn.Linear(256, 10)
    
    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))

# 必须先创建网络结构（和保存时一样）
clone = MLP()
clone.load_state_dict(torch.load('mlp.params'))
clone.eval()  # 设为评估模式

# 验证一下输出是不是一样的
X = torch.randn(size=(2, 20))
Y_clone = clone(X)
print(Y_clone)

注意：

保存的是参数，不是整个模型
加载时必须先创建结构一样的网络
eval()是设为评估模式（不用dropout等）

六、GPU：装上高性能显卡，速度飞起！

1. 计算设备：看看你有没有显卡！

先看看你有哪些计算设备：

import torch

# 查看有没有GPU
print(torch.device('cpu'))
print(torch.cuda.device_count())  # 有几块GPU
print(torch.cuda.is_available())  # GPU可用吗？

选择设备：

# 选择GPU 0，如果有的话，否则用CPU
def try_gpu(i=0):
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')

# 选择所有可用的GPU
def try_all_gpus():
    devices = [torch.device(f'cuda:{i}')
             for i in range(torch.cuda.device_count())]
    return devices if devices else [torch.device('cpu')]

print(try_gpu())
print(try_gpu(10))
print(try_all_gpus())

2. 张量与GPU：把数据搬到显卡上！

在GPU上创建张量：

import torch

# 选择GPU 0，如果有的话，否则用CPU
def try_gpu(i=0):
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')

# 在GPU 0上创建张量
X = torch.ones(2, 3, device=try_gpu())
print(X)

把张量从CPU搬到GPU：

import torch

def try_gpu(i=0):
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')

# 在CPU上创建
Z = torch.tensor([1, 2, 3])
print(Z.device)

# 搬到GPU
if torch.cuda.is_available():
    Z_gpu = Z.cuda(0)
    print(Z_gpu.device)

    # 或者用to方法
    Z_gpu2 = Z.to('cuda:0')
    print(Z_gpu2.device)

注意：

只有在同一个设备上的张量才能运算
如果X在GPU 0，Y在GPU 1，不能直接相加！

在GPU上运算：

import torch

def try_gpu(i=0):
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')

X = torch.ones(2, 3, device=try_gpu())
Y = torch.rand(2, 3, device=try_gpu())
print(X + Y)  # 都在GPU 0上，可以运算

3. 神经网络与GPU：把模型搬到显卡上！

把网络搬到GPU：

import torch
from torch import nn

def try_gpu(i=0):
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')

# 创建网络，然后搬到GPU
net = nn.Sequential(nn.Linear(3, 1))
net = net.to(device=try_gpu())

# 输入数据也得在GPU上
X = torch.ones(2, 3, device=try_gpu())
print(net(X))

# 看看模型参数在哪个设备上
print(net[0].weight.data.device)

记住：

模型和数据必须在同一个设备上
建议：先选好设备，然后把模型和数据都搬到那个设备上

七、小结：从"基础用户"到"高级用户"！

今天我们学会了：

层和块：
- 层是零件，块是组装好的整机
- 自定义块：继承nn.Module，实现forward
- Sequential：简单的顺序块
- 块可以嵌套，可以灵活组装
参数管理：
- 访问参数：state_dict、named_parameters
- 初始化参数：内置方法、自定义方法
- 参数绑定：多个层共享同一个参数
延后初始化：
- LazyLinear：第一次前向传播时才初始化
- 不用手动计算输入维度
自定义层：
- 不带参数：只实现forward
- 带参数：用nn.Parameter定义参数
读写文件：
- 保存/加载张量：torch.save、torch.load
- 保存/加载模型参数：state_dict
- 加载时需要先创建相同结构的网络
GPU加速：
- 查看设备：torch.device、cuda.is_available()
- 张量在GPU上：device参数、to方法、cuda方法
- 模型在GPU上：net.to(device)
- 模型和数据必须在同一个设备

给初学者的建议：

块就像搭积木——先学会用现成的，再学会自己造
参数就像硬件配置——可以查看、可以修改
GPU就像高性能显卡——能大幅加速，但模型和数据必须在同一个设备上
延后初始化是个好东西——能帮你省很多计算维度的麻烦

写在最后

今天我们打开了深度学习的"工具箱"，从"基础用户"升级为"高级用户"！现在你不仅会用现成的积木，还会自己造积木、自己组装、自己保存、自己用GPU加速！

后面的高级模型（比如卷积神经网络、循环神经网络），本质上都是用这些工具搭出来的——只是搭的方式更精巧而已！

如果有疑问，欢迎留言交流，一起深入探索深度学习的工具箱！

（注：文档部分内容参考《动手学深度学习》）
动手学深度学习深度学习计算： https://zh.d2l.ai/chapter_deep-learning-computation/index.html

AtomGit开源社区

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念，把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起，为开发者提供从开发、训练到部署的一站式体验。

更多推荐

AI Agent 为什么会跑偏：目标漂移、上下文污染和工具诱导

AtomGit开源社区

Linux 线程同步硬核解析：从条件变量、阻塞队列到信号量环形队列

AtomGit开源社区

efficientteacher 高效教师：面向 YOLOv5 的半监督目标检测

Epoch Adaptor（数据与训练调度器）有标注数据：经过Mosaic增强后输入学生模型，提供监督信号无标注数据：用Mosaic增强后再做一次强增强，分别送入教师模型和学生模型教师学生双Dense DectorDense Dector：本文提出的核心一阶段锚框检测器，基于RetinaNet融合YOLOv5优化的高效检测模型学生模型：接收有/无标注数据，负责学习检测任务教师模型：学生通过指数移动

AtomGit开源社区

所有评论(0)

查看更多评论

2401_83600008

@2401_83600008

已为社区贡献5条内容

深度学习计算：打开工具箱，从“基础用户“升级为“高级用户“

2401_83600008

深度学习的"工具箱"：层、块、参数与GPU

一、层和块：从零件到整机

1. 什么是层和块？

2. 自定义块：自己组装一台电脑！

3. 顺序块：用流水线组装！

4. 在前向传播里执行代码：灵活组装！

5. 嵌套块：电脑里可以装服务器！

二、参数管理：查看和调整硬件配置！

1. 参数访问：看看硬件配置！

访问特定参数：

一次性访问所有参数：

2. 从嵌套块里收集参数：拆开服务器看配置！

3. 参数初始化：给硬件设置默认值！

默认初始化：

内置初始化：

常数初始化：

自定义初始化：

直接设置参数：

4. 参数绑定：两台电脑用同一个显卡！

三、延后初始化：先装机，再看需要什么配置！

1. 什么是延后初始化？

2. 第一次前向传播时才初始化！

四、自定义层：自己造硬件！

1. 不带参数的层：造一个简单零件！

2. 带参数的层：造一个带开关的零件！

五、读写文件：保存和加载你的电脑！

1. 加载和保存张量：保存一个硬件！

2. 加载和保存模型参数：保存你的整机配置！

六、GPU：装上高性能显卡，速度飞起！

1. 计算设备：看看你有没有显卡！

选择设备：

2. 张量与GPU：把数据搬到显卡上！

在GPU上创建张量：

把张量从CPU搬到GPU：

在GPU上运算：

3. 神经网络与GPU：把模型搬到显卡上！

把网络搬到GPU：

七、小结：从"基础用户"到"高级用户"！

今天我们学会了：

给初学者的建议：

写在最后

所有评论(0)

温馨提示：您尚未绑定手机号

2401_83600008