原先以李牧的dive into deep learning为主,由于代码实在实在太难看了d2l包又封装了不少加上很多地方讲的又很模糊,后面大量选取其他文章内容导致风格变化很大,希望谅解
使用 python3.11 与 pycharm 进行实验


可供参考
BP(Back Propagation)神经网络——原理篇 ps:其实就反向传播
零基础入门深度学习(2) - 线性单元和梯度下降

线性回归

回归(regression)是能为一个或多个自变量与因变量之间关系建模的一类方法。

线性回归模型的数学定义

设有 $n$ 个观测样本,每个样本有 $p$ 个特征,自变量矩阵记为 $X \in \mathbb{R}^{n \times p}$,响应变量为 $y \in \mathbb{R}^n$。

线性回归模型的形式为:

$$
y = X \mathbf{w} + b
$$

其中:

  • $y = [y_1, y_2, \dots, y_n]^\top$:因变量(目标变量)
  • $X$:自变量矩阵,每行 $x_i \in \mathbb{R}^p$ 是一个样本的特征向量
  • $\mathbf{w} \in \mathbb{R}^p$:模型参数(权重向量)
  • $b \in \mathbb{R}^n$:随机误差项

线性回归的经典假设(Gauss–Markov 假设)

  1. 线性性(Linearity)
    模型是线性的

  2. 独立性(Independence)
    所有误差项 $\varepsilon_i$ 相互独立。
    对时间序列尤需注意,不能有自相关。

  3. 同方差性(Homoscedasticity)
    不论样本值大小、位置如何,模型预测的误差波动程度是一样的。保证模型的参数估计是最小方差无偏估计(BLUE)。所有误差项具有相同方差:
    $$
    \operatorname{Var}(\varepsilon_i) = \sigma^2, \quad \forall i
    $$
    假设“同方差”是为了让最小二乘法的推导成立,简化数学模型,“暂时假设” 所有观测的误差波动一致。
    正因为所有点的误差方差都一样,可以视作具有相同可信度的观测,故损失函数就是最小二乘法的平方误差,它的最优解可以直接通过正规方程一次求解(解析解)。

  4. 误差期望为零(Unbiased Errors)
    模型在平均意义上没有系统偏差,残差左右波动是“平衡的”,确保模型预测不会“整体偏高”或“整体偏低”,且参数 $\mathbf{w}$ 的估计是无偏的:
    $$
    \mathbb{E}[\varepsilon_i] = 0, \quad \forall i
    $$

  5. 无完全多重共线性(No Perfect Multicollinearity)
    自变量矩阵 $X$ 的列向量线性无关,即:
    $$
    X^\top X \text{ 可逆(即满秩)}
    $$

  6. 误差正态分布(Normality of Errors)
    若希望进行置信区间、$t$ 检验等统计推断,需假设误差项服从正态分布:
    $$
    \varepsilon_i \sim \mathcal{N}(0, \sigma^2)
    $$

    注意:这不是最小二乘解存在的必要条件,但用于计算置信区间、$p$ 值等推断指标时是前提条件。

正态分布(normal distribution),即高斯分布(Gaussian distribution)

通过对噪声分布的假设来解读平方损失目标函数。
均方误差损失函数(简称均方损失)可以用于线性回归的一个假设是:观测中包含噪声,且噪声服从正态分布。

标准正态分布(均值为 0,方差为 1)密度函数:

$$
f(x) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{x^2}{2}}
$$

一般形式,正态分布概率密度函数(均值为 $\mu$,方差为 $\sigma^2$)的密度函数:
$$
f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x - \mu)^2}{2 \sigma^2}}
$$

均方误差损失函数(简称均方损失)

神经网络图

对于线性回归,每个输入都与每个输出相连,这种变换称为全连接层(fully-connected layer)或称为稠密层(dense layer)

softmax 回归(softmax regression)

logistic 回归的一般形式,用于多分类。softmax回归是一个单层神经网络,只有输入到输出的直接映射。

$$
\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^N e^{z_j}}
$$

  • $z_i$:第 $i$ 个类别的 logit(未归一化的分数)。
  • $e^{z_i}$:对 logit 取指数,放大差异并确保非负。
  • $\sum_{j=1}^N e^{z_j}$:归一化因子,确保输出概率和为 1。

plt 画图(matplotlib.pyplot)

1
2
3
4
5
6
7
8
9
10
plt.figure(figsize=(8, 6))  # 设置绘图窗口大小
plt.plot(x, y, label="函数曲线", color="blue") # 绘制函数曲线,设置标签和颜色
plt.plot(x, tangent_line, '--', label="切线", color="red") # 绘制切线,虚线样式
plt.scatter(i, y0, color="black", label="切点") # 标出切点位置
plt.title("函数及其切线") # 设置图表标题
plt.xlabel("x") # x轴标签
plt.ylabel("y") # y轴标签
plt.legend() # 显示图例
plt.grid(True) # 显示网格
plt.show() # 展示图形

高斯概率密度函数(numpy + math)

$$
f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x - \mu)^2}{2 \sigma^2}}
$$

1
2
3
4
5
6
7
def normal(x, mu=0.0, sigma=1.0):
x = np.asarray(x, dtype=np.float64) # 转换后保证为标准的 NumPy 数组
if sigma <= 0:
raise ValueError("sigma must be positive")
coeff = 1.0 / (np.sqrt(2 * np.pi) * sigma) #归一化系数
exponent = -0.5 * ((x - mu) / sigma) ** 2 #指数
return coeff * np.exp(exponent)

线性神经网络

线性回归的从零开始实现

线性回归完整代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import torch
import random

def synthetic_data(w, b, num_examples):
X = torch.normal(0, 1, (num_examples, len(w)))
y = torch.matmul(X, w) + b
y += torch.normal(0, 0.01, y.shape)
return X, y.reshape((-1, 1))

def data_iter(batch_size, features, labels):
num_examples = len(features)
indices = list(range(num_examples))
random.shuffle(indices)
for i in range(0, num_examples, batch_size):
batch_indices = torch.tensor(
indices[i: min(i + batch_size, num_examples)])
yield features[batch_indices], labels[batch_indices]

def squared_loss(y_hat, y):
return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2

def linreg(X, w, b):
return torch.matmul(X, w) + b

def sgd(params, lr, batch_size):
with torch.no_grad():
for param in params:
param -= lr * param.grad / batch_size
param.grad.zero_()

def main():
true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)
w = torch.normal(0, 0.01, size=(2, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)
batch_size = 10
lr = 0.03
num_epochs = 3
net = linreg
loss = squared_loss
for epoch in range(num_epochs):
for X, y in data_iter(batch_size, features, labels):
l = loss(net(X, w, b), y)
l.sum().backward()
sgd([w, b], lr, batch_size)
with torch.no_grad():
train_l = loss(net(features, w, b), labels)
print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')

if __name__ == '__main__':
main()

生成数据集

噪声是目标变量 y 的随机误差,根据噪声服从正态分布假设,不参与偏置和权重加权求和。

个人看法:看下来噪声就是现实要考虑的因素太多了没办法全列就只好随便加减点数值当作现实随机影响,因为以已知数据为主不能加太过,但加少了跟没加一样太理想,这个度看情况。

y 是一维向量(n),但需要将其转换为二维矩阵(nx1)。因为模型的预测输出是二维张量,标签必须与之形状一致才能匹配,否则计算损失时会报错。二维矩阵表示每个样本对应一个标签,决定了它是否可以广播(broadcast)、是否能被线性层接受、是否可以与其他张量拼接或乘法。
y.reshape((-1, 1))中-1 是自动推断维度的意思,自计算样本数。

1
2
3
4
5
def synthetic_data(w, b, num_examples):
X = torch.normal(0, 1, (num_examples, len(w))) # num_examples为样本数,len(w)为特征数
y = torch.matmul(X, w) + b # 偏置加权求和,计算每个样本的预测值(标签)
y += torch.normal(0, 0.01, y.shape) # 添加随机扰动(噪声)
return X, y.reshape((-1, 1)) # 保持模型输出和标签结构完全一致

读取数据集

数据原始顺序不是随机的,且训练一般是多轮,每一轮都要重新打乱样本顺序,能增加模型泛化能力,防止学习顺序带来的偏差。

1
2
3
4
5
6
7
8
9
10
def data_iter(batch_size, features, labels):
num_examples = len(features)
indices = list(range(num_examples))
# 这些样本是随机读取的,没有特定的顺序
random.shuffle(indices) # 就地修改原列表的顺序
for i in range(0, num_examples, batch_size):
batch_indices = torch.tensor(
indices[i: min(i + batch_size, num_examples)]) # 简单的边界控制
yield features[batch_indices], labels[batch_indices]
# yield在函数中返回一个值,但不退出函数,可以边输出边运行,常用于大数据

初始化模型参数

1
2
w = torch.normal(0, 0.01, size=(2,1), requires_grad=True)   # 打破对称性,避免所有权重一样导致训练陷入停滞
b = torch.zeros(1, requires_grad=True) # b一维,但是计算时广播机制按二维计算

定义模型

$$
f(\mathbf{X}) = \mathbf{X} \mathbf{w} + b
$$

1
2
def linreg(X, w, b):
return torch.matmul(X, w) + b # 经典线性

定义损失函数

$$
L(\hat{y}, y) = \frac{1}{2} (\hat{y} - y)^2
$$

1
2
def squared_loss(y_hat, y):  
return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2 # 均方损失

定义优化算法

小批量随机梯度下降(SGD)
默认累积梯度可能是多次SDG判断两点间方向一次性更新

1
2
3
4
5
def sgd(params, lr, batch_size):  
with torch.no_grad(): # 禁用自动梯度更新,防止修改计算图
for param in params:
param -= lr * param.grad / batch_size # -=是原地更新参数能节省内存
param.grad.zero_() # 默认会累积梯度,手动清零

训练–main参数

在统计建模里,我们假设数据是由某个真实的概率分布或函数生成的,这个分布有一组未知但固定的参数,我们就称之为 “真实参数(true parameters)

真实参数只在教学演示、算法验证、理论分析中有用,现实应用中根本没有、也不需要真实参数。
也就是说,保真不保对,真实参数由于是人工标注的,完全可以是错误的。通过数据找到真实函数才是正确的。真实函数是有了答案用来反推其他解法对不对。

虽然 loss 是标量,但它是参数的函数。

可参考
pytorch——计算图与动态图机制

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def main():
true_w = torch.tensor([2, -3.4]) # 真实参数
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)

w = torch.normal(0, 0.01, size=(2, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

batch_size = 10
lr = 0.03 # learn rate 学习率
num_epochs = 3
net = linreg
loss = squared_loss
# epoch:把训练集完整地训练一遍
for epoch in range(num_epochs):
for X, y in data_iter(batch_size, features, labels):
l = loss(net(X, w, b), y) # net就是线性函数linreg
l.sum().backward() # 求整体下降方向
# l形状为(batch_size,1),但backward() 只能用于标量 loss,因此需要先 sum()
sgd([w, b], lr, batch_size) # 使用参数的梯度更新参数
with torch.no_grad():
train_l = loss(net(features, w, b), labels) # 评估效果
print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')

线性回归的简洁实现

深度学习框架,更简洁的实现方式

线性回归深度学习框架完整代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import torch
from torch.utils import data
from d2l import torch as d2l
from torch import nn

def load_array(data_arrays, batch_size, is_train=True):
dataset = data.TensorDataset(*data_arrays)
return data.DataLoader(dataset, batch_size, shuffle=is_train)

def main():
true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = d2l.synthetic_data(true_w, true_b, 1000)
batch_size = 10
data_iter = load_array((features, labels), batch_size)

# 打印一个小批量样本(可选)
batch = next(iter(data_iter))
print('一个批量数据示例:', batch)

net = nn.Sequential(nn.Linear(2, 1))
net[0].weight.data.normal_(0, 0.01)
net[0].bias.data.fill_(0)

loss = nn.MSELoss()

trainer = torch.optim.SGD(net.parameters(), lr=0.03)

num_epochs = 3
for epoch in range(num_epochs):
for X, y in data_iter:
l = loss(net(X), y)
trainer.zero_grad()
l.backward()
trainer.step()
with torch.no_grad():
train_l = loss(net(features), labels)
print(f'epoch {epoch + 1}, loss {train_l:f}')

w = net[0].weight.data
b = net[0].bias.data
print('w的估计误差:', true_w - w.reshape(true_w.shape))
print('b的估计误差:', true_b - b)

if __name__ == '__main__':
main()

生成数据集

1
2
3
4
5
6
7
8
import numpy as np
import torch
from torch.utils import data # PyTorch的数据工具包
from d2l import torch as d2l

true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = d2l.synthetic_data(true_w, true_b, 1000) # 详见上节

读取数据集

构造了一个配对好的数据集对象,把特征和标签“绑”在一起,方便批量读取和训练

  1. data_arrays 是一个元组或列表,包含你的数据张量,比如 (features, labels)
  2. data_arrays 是解包操作,把元组拆开传给 TensorDataset,这样 TensorDataset 会把特征和标签(拆成*两个参数)对应打包成样本;
  3. data.TensorDataset:把多个张量组合成一个数据集,样本是 (feature, label) 这样的二元组
  4. data.DataLoader:给这个数据集创建一个加载器,能分批次读取数据,batch_size 是每批样本数;
  5. shuffle=is_train:如果是训练,打乱数据顺序;否则不打乱。

    拆包是为了让参数传递“对上号”,函数才能正常工作;函数内部用元组只是参数收集的自然机制,跟外面的元组不是一回事,不能混为一谈

    1
    2
    3
    4
    5
    6
    7
    8
    # 构造PyTorch数据迭代器  
    def load_array(data_arrays, batch_size, is_train=True): # arrays包含数据张量
    dataset = data.TensorDataset(*data_arrays)
    return data.DataLoader(dataset, batch_size, shuffle=is_train)

    batch_size = 10
    data_iter = load_array((features, labels), batch_size)
    next(iter(data_iter)) # next从迭代器中获取第一项

定义模型

nn.Sequential是PyTorch里一种把多个层(layer)按顺序串联起来的容器。会把传入的层顺序执行。
nn.Linear(2, 1)是一个全连接层(线性层),表示:输入维度是 2(也就是说输入数据是2个特征);输出维度是 1(输出是1个数y)。

1
2
3
# nn神经网络
from torch import nn # 单独导入 torch 里的 nn 子模块,方便直接用 nn.Linear不用写成 torch.nn.Linear
net = nn.Sequential(nn.Linear(2, 1))

初始化模型参数

方法名后带下划线 _ 是一种命名约定,表示这个方法是原地操作(in-place operation)

1
2
net[0].weight.data.normal_(0, 0.01)  # weight权重
net[0].bias.data.fill_(0) # bias偏置

定义损失函数

计算均方误差使用的是MSELoss类,也称为平方L2范数。默认返回所有样本损失的平均值。

1
loss = nn.MSELoss()

定义优化算法

优化器(optimizer),随机梯度下降法(SGD)

  1. torch.optim.SGD:PyTorch里实现的随机梯度下降优化算法类。
  2. net.parameters():取出神经网络 net 中可学习的参数(权重和偏置)。
  3. lr=0.03:学习率(learning rate),控制参数更新的步长大小,0.03是比较常用的一个初始值。
1
trainer = torch.optim.SGD(net.parameters(), lr=0.03)

训练

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
num_epochs = 3
for epoch in range(num_epochs):
for X, y in data_iter:
l = loss(net(X) ,y) # 计算当前批次的预测误差(损失)
trainer.zero_grad() # 清零梯度,防止梯度累加
l.backward() # 反向传播计算当前批次梯度
trainer.step() # 用优化器更新模型参数
# features(特征)输入数据,用来判断或预测,labels(标签)真实目标值,模型要预测的正确答案
l = loss(net(features), labels)
print(f'epoch {epoch + 1}, loss {l:f}')
# 比较生成数据集的真实参数和通过有限数据训练获得的模型参数,进行评估
w = net[0].weight.data
print('w的估计误差:', true_w - w.reshape(true_w.shape))
b = net[0].bias.data
print('b的估计误差:', true_b - b)

图像分类数据集

图像分类完整代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
import torch
import torchvision
from torch.utils import data
from torchvision import transforms
from d2l import torch as d2l # 使用 SVG 格式显示图像,而不是默认的PNG格式

d2l.use_svg_display()

def get_fashion_mnist_labels(labels):
"""返回Fashion-MNIST数据集的文本标签"""
text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
return [text_labels[int(i)] for i in labels]

def show_images(imgs, num_rows, num_cols, titles=None, scale=1.5):
figsize = (num_cols * scale, num_rows * scale)
_, axes = d2l.plt.subplots(num_rows, num_cols, figsize=figsize)
axes = axes.flatten()
for i, (ax, img) in enumerate(zip(axes, imgs)):
if torch.is_tensor(img):
ax.imshow(img.numpy())
else:
ax.imshow(img)
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
if titles:
ax.set_title(titles[i])
return axes

def main():
batch_size = 256
trans = transforms.ToTensor()
mnist_train = torchvision.datasets.FashionMNIST(
root="../data", train=True, transform=trans, download=True)
mnist_test = torchvision.datasets.FashionMNIST(
root="../data", train=False, transform=trans, download=True)
print(f"训练集样本数: {len(mnist_train)}")
print(f"测试集样本数: {len(mnist_test)}")
print(mnist_train[0][0].shape)
X, y = next(iter(data.DataLoader(mnist_train, batch_size=18)))
show_images(X.reshape(18, 28, 28), 2, 9, titles=get_fashion_mnist_labels(y));
d2l.plt.show()
train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True,
num_workers=get_dataloader_workers())
timer = d2l.Timer()
for X, y in train_iter:
continue
print(f'{timer.stop():.2f} sec')
train_iter, test_iter = load_data_fashion_mnist(32, resize=64)
for X, y in train_iter:
print(X.shape, X.dtype, y.shape, y.dtype)
break

def load_data_fashion_mnist(batch_size, resize=None):
trans = [transforms.ToTensor()]
if resize:
trans.insert(0, transforms.Resize(resize))
trans = transforms.Compose(trans)
mnist_train = torchvision.datasets.FashionMNIST(
root="../data", train=True, transform=trans, download=True)
mnist_test = torchvision.datasets.FashionMNIST(
root="../data", train=False, transform=trans, download=True)
return (data.DataLoader(mnist_train, batch_size, shuffle=True,
num_workers=get_dataloader_workers()),
data.DataLoader(mnist_test, batch_size, shuffle=False,
num_workers=get_dataloader_workers()))

def get_dataloader_workers():
return 4

if __name__ == '__main__':
main()

读取数据集

  1. trans = transforms.ToTensor()是PyTorch中用于数据预处理转换的,它将图像数据从PIL(Python Imaging Library)格式或NumPy数组转换为Tensor格式。
  2. train代表是否加载训练集(True 表示训练集,False 表示测试集)
  3. transform代表图像转换函数,trans上文的
1
2
3
4
5
6
7
# 通过ToTensor实例将图像数据从PIL类型变换成32位浮点数格式,
# 并除以255使得所有像素的数值均在0~1之间
trans = transforms.ToTensor()
mnist_train = torchvision.datasets.FashionMNIST(
root="../data", train=True, transform=trans, download=True)
mnist_test = torchvision.datasets.FashionMNIST(
root="../data", train=False, transform=trans, download=True)
1
2
3
4
5
def get_fashion_mnist_labels(labels):  
"""返回Fashion-MNIST数据集的文本标签"""
text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
return [text_labels[int(i)] for i in labels]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# imgs图像的列表或张量集合,num_rows,num_cols: 每行、列显示的图像数量,scale缩放程度
def show_images(imgs, num_rows, num_cols, titles=None, scale=1.5):
"""绘制图像列表"""
# figsize表示计算图像显示区域的大小
figsize = (num_cols * scale, num_rows * scale)
# _用于丢弃 Figure 对象(表示不关心),因为函数返回的两个值中只使用 axes
# d2l.plt.subplots创建一个包含 num_rows 行和 num_cols 列子图的图像画布
_, axes = d2l.plt.subplots(num_rows, num_cols, figsize=figsize)
axes = axes.flatten() # 转换为一维数组,以便于通过 i 索引直接访问每个子图
for i, (ax, img) in enumerate(zip(axes, imgs)):
# 由于 imshow 不直接支持 PyTorch 张量,所以若为张量必须先转换为 NumPy 数组
if torch.is_tensor(img):
# 图片张量
ax.imshow(img.numpy())
else:
# PIL图片
ax.imshow(img)
# 隐藏坐标轴的 x 轴和 y 轴
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
if titles:
ax.set_title(titles[i]) # 标题
# 返回 axes,即图像网格的坐标轴对象数组
return axes
  1. DataLoader 加载数据,按批次(batch)将数据分割并提供给模型。
  2. iter(…) 将 DataLoader 对象转换为一个迭代器,可以按批次取数据。
  3. next(…) 从迭代器中获取下一个批次的数据。这里获取的是训练集中第一批。
    1
    2
    X, y = next(iter(data.DataLoader(mnist_train, batch_size=18)))
    show_images(X.reshape(18, 28, 28), 2, 9, titles=get_fashion_mnist_labels(y))

读取小批量

1
2
3
4
5
6
7
8
9
batch_size = 256

def get_dataloader_workers(): #@save
"""使用4个进程(CPU线程)来读取数据"""
return 4
# shuffle=True,在每个 epoch(训练轮次)开始时,将数据集中的数据打乱顺序
# num_workers=get_dataloader_workers()指定在加载数据时,使用多少个子进程来并行读取数据
train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True,
num_workers=get_dataloader_workers())
1
2
3
4
timer = d2l.Timer()
for X, y in train_iter:
continue
f'{timer.stop():.2f} sec'

整合所有组件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def load_data_fashion_mnist(batch_size, resize=None):  
trans = [transforms.ToTensor()] # 放进列表里,方便用列表方法操作
if resize:
trans.insert(0, transforms.Resize(resize))
# Compose将多个图像变换按顺序组合串联变换流程,它不代表具体工序,而是指导图像按正确顺序处理(就流水线)
trans = transforms.Compose(trans)
mnist_train = torchvision.datasets.FashionMNIST( # 若没有数据集则自动下载
root="../data", train=True, transform=trans, download=True)
mnist_test = torchvision.datasets.FashionMNIST(
root="../data", train=False, transform=trans, download=True)
return (data.DataLoader(mnist_train, batch_size, shuffle=True,
num_workers=get_dataloader_workers()),
data.DataLoader(mnist_test, batch_size, shuffle=False,
num_workers=get_dataloader_workers()))
1
2
3
4
5
6
# 加载批大小为32、图像大小调整为64的Fashion-MNIST数据集
# 遍历训练集的第一个批次,打印输入和标签的形状及数据类型,随后停止循环
train_iter, test_iter = load_data_fashion_mnist(32, resize=64)
for X, y in train_iter:
print(X.shape, X.dtype, y.shape, y.dtype)
break

softmax回归的从零开始实现

softmax回归完整代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
import matplotlib.pyplot as plt
import torch
from IPython import display
from d2l import torch as d2l

num_inputs = 784
num_outputs = 10
W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)
b = torch.zeros(num_outputs, requires_grad=True)

def softmax(X):
X_exp = torch.exp(X)
partition = X_exp.sum(1, keepdim=True)
return X_exp / partition

def net(X):
return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)

def cross_entropy(y_hat, y):
return - torch.log(y_hat[range(len(y_hat)), y])

def accuracy(y_hat, y):
if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
y_hat = y_hat.argmax(axis=1)
cmp = y_hat.type(y.dtype) == y
return float(cmp.type(y.dtype).sum())

def evaluate_accuracy(net, data_iter):
if isinstance(net, torch.nn.Module):
net.eval()
metric = Accumulator(2)
with torch.no_grad():
for X, y in data_iter:
metric.add(accuracy(net(X), y), y.numel())
return metric[0] / metric[1]

class Accumulator:
def __init__(self, n):
self.data = [0.0] * n

def add(self, *args):
self.data = [a + float(b) for a, b in zip(self.data, args)]

def reset(self):
self.data = [0.0] * len(self.data)

def __getitem__(self, idx):
return self.data[idx]

class Animator:
def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,
ylim=None, xscale='linear', yscale='linear',
fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,
figsize=(3.5, 2.5)):
if legend is None:
legend = []
d2l.use_svg_display()
self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)
if nrows * ncols == 1:
self.axes = [self.axes, ]
self.config_axes = lambda: d2l.set_axes(
self.axes[0], xlabel, ylabel, xlim, ylim, xscale, yscale, legend)
self.X, self.Y, self.fmts = None, None, fmts

def add(self, x, y):
if not hasattr(y, "__len__"):
y = [y]
n = len(y)
if not hasattr(x, "__len__"):
x = [x] * n
if not self.X:
self.X = [[] for _ in range(n)]
if not self.Y:
self.Y = [[] for _ in range(n)]
for i, (a, b) in enumerate(zip(x, y)):
if a is not None and b is not None:
self.X[i].append(a)
self.Y[i].append(b)
self.axes[0].cla()
for x, y, fmt in zip(self.X, self.Y, self.fmts):
self.axes[0].plot(x, y, fmt)
self.config_axes()
display.display(self.fig)
display.clear_output(wait=True)

def train_epoch_ch3(net, train_iter, loss, updater):
if isinstance(net, torch.nn.Module):
net.train()
metric = Accumulator(3)
for X, y in train_iter:
y_hat = net(X)
l = loss(y_hat, y)
if isinstance(updater, torch.optim.Optimizer):
updater.zero_grad()
l.mean().backward()
updater.step()
else:
l.sum().backward()
updater(X.shape[0])
metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
return metric[0] / metric[2], metric[1] / metric[2]

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):
animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
legend=['train loss', 'train acc', 'test acc'])
for epoch in range(num_epochs):
train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
test_acc = evaluate_accuracy(net, test_iter)
animator.add(epoch + 1, train_metrics + (test_acc,))
train_loss, train_acc = train_metrics
assert train_loss < 0.5
assert 0.7 < train_acc <= 1
assert 0.7 < test_acc <= 1

def predict_ch3(net, test_iter, n=6):
for X, y in test_iter:
break
trues = d2l.get_fashion_mnist_labels(y)
preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))
titles = [true + '\n' + pred for true, pred in zip(trues, preds)]
d2l.show_images(X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])

lr = 0.1
def updater(batch_size):
return d2l.sgd([W, b], lr, batch_size)

def main():
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

X = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
print(X.sum(0, keepdim=True), X.sum(1, keepdim=True))

X = torch.normal(0, 1, (2, 5))
X_prob = softmax(X)
print(X_prob, X_prob.sum(1))

y = torch.tensor([0, 2])
y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
print(y_hat[[0, 1], y])
print(cross_entropy(y_hat, y))
print(accuracy(y_hat, y) / len(y))

print(evaluate_accuracy(net, test_iter))

num_epochs = 10
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater)

predict_ch3(net, test_iter)
plt.show()

if __name__ == '__main__':
main()

初始化模型参数

1
2
3
4
5
num_inputs = 784
num_outputs = 10

W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)
b = torch.zeros(num_outputs, requires_grad=True)

定义softmax操作

1
2
3
X = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
# X.sum(...) 用来对张量 X 的某个维度进行求和
X.sum(0, keepdim=True), X.sum(1, keepdim=True)
1
2
3
4
def softmax(X):
X_exp = torch.exp(X)
partition = X_exp.sum(1, keepdim=True)
return X_exp / partition # 这里应用了广播机制
1
2
3
X = torch.normal(0, 1, (2, 5))
X_prob = softmax(X)
X_prob, X_prob.sum(1)

定义模型

1
2
def net(X):
return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)

定义损失函数

1
2
3
y = torch.tensor([0, 2])
y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y_hat[[0, 1], y]
1
2
3
4
def cross_entropy(y_hat, y):
return - torch.log(y_hat[range(len(y_hat)), y])

cross_entropy(y_hat, y)

分类精度

1
2
3
4
5
6
def accuracy(y_hat, y): 
"""计算预测正确的数量"""
if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
y_hat = y_hat.argmax(axis=1)
cmp = y_hat.type(y.dtype) == y
return float(cmp.type(y.dtype).sum())
1
accuracy(y_hat, y) / len(y)
1
2
3
4
5
6
7
8
9
def evaluate_accuracy(net, data_iter):  #@save
"""计算在指定数据集上模型的精度"""
if isinstance(net, torch.nn.Module):
net.eval() # 将模型设置为评估模式
metric = Accumulator(2) # 正确预测数、预测总数
with torch.no_grad():
for X, y in data_iter:
metric.add(accuracy(net(X), y), y.numel())
return metric[0] / metric[1]
1
2
3
4
5
6
7
8
9
10
11
12
13
class Accumulator:  #@save
"""在n个变量上累加"""
def __init__(self, n):
self.data = [0.0] * n

def add(self, *args):
self.data = [a + float(b) for a, b in zip(self.data, args)]

def reset(self):
self.data = [0.0] * len(self.data)

def __getitem__(self, idx):
return self.data[idx]
1
evaluate_accuracy(net, test_iter)

训练

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def train_epoch_ch3(net, train_iter, loss, updater):  #@save  
if isinstance(net, torch.nn.Module): # 只有当 net 是 torch.nn.Module 实例时才调用
net.train() # 将模型设置为训练模式
# 计数器 Accumulator,累计训练损失总和、训练准确度总和、样本数三指标所以 3
metric = Accumulator(3)
for X, y in train_iter:
# 计算梯度并更新参数
y_hat = net(X)
l = loss(y_hat, y)
if isinstance(updater, torch.optim.Optimizer):
# 使用PyTorch内置的优化器和损失函数
updater.zero_grad()
l.mean().backward()
updater.step()
else:
# 使用定制的优化器和损失函数
l.sum().backward()
updater(X.shape[0])
metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
# 返回训练损失和训练精度
return metric[0] / metric[2], metric[1] / metric[2]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
class Animator: 
"""在动画中绘制数据"""
def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,
ylim=None, xscale='linear', yscale='linear',
fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,
figsize=(3.5, 2.5)):
# 增量地绘制多条线
if legend is None:
legend = []
d2l.use_svg_display()
self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)
if nrows * ncols == 1:
self.axes = [self.axes, ]
# 使用lambda函数捕获参数
self.config_axes = lambda: d2l.set_axes(
self.axes[0], xlabel, ylabel, xlim, ylim, xscale, yscale, legend)
self.X, self.Y, self.fmts = None, None, fmts

def add(self, x, y):
# 向图表中添加多个数据点
if not hasattr(y, "__len__"):
y = [y]
n = len(y)
if not hasattr(x, "__len__"):
x = [x] * n
if not self.X:
self.X = [[] for _ in range(n)]
if not self.Y:
self.Y = [[] for _ in range(n)]
for i, (a, b) in enumerate(zip(x, y)):
if a is not None and b is not None:
self.X[i].append(a)
self.Y[i].append(b)
self.axes[0].cla()
for x, y, fmt in zip(self.X, self.Y, self.fmts):
self.axes[0].plot(x, y, fmt)
self.config_axes()
display.display(self.fig)
display.clear_output(wait=True)
1
2
3
4
5
6
7
8
9
10
11
def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):  
animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
legend=['train loss', 'train acc', 'test acc'])
for epoch in range(num_epochs):
train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
test_acc = evaluate_accuracy(net, test_iter)
animator.add(epoch + 1, train_metrics + (test_acc,))
train_loss, train_acc = train_metrics
assert train_loss < 0.5, train_loss
assert train_acc <= 1 and train_acc > 0.7, train_acc
assert test_acc <= 1 and test_acc > 0.7, test_acc
1
2
3
4
lr = 0.1

def updater(batch_size):
return d2l.sgd([W, b], lr, batch_size)
1
2
num_epochs = 10
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater)

预测

1
2
3
4
5
6
7
8
9
10
def predict_ch3(net, test_iter, n=6): 
for X, y in test_iter:
break
trues = d2l.get_fashion_mnist_labels(y)
preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))
titles = [true +'\n' + pred for true, pred in zip(trues, preds)]
d2l.show_images(
X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])

predict_ch3(net, test_iter)

softmax回归的简洁实现

d2l新版是没有d2l.train_ch3的,要么导入要么降低d2l版本,详见 https://discuss.d2l.ai/t/softmax/1793/83

softmax回归简洁实现完整代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import torch
from torch import nn
from d2l import torch as d2l

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

net = nn.Sequential(nn.Flatten(), nn.Linear(784, 10))

def init_weights(m):
if type(m) == nn.Linear:
nn.init.normal_(m.weight, std=0.01)

def main():
net.apply(init_weights)
loss = nn.CrossEntropyLoss(reduction='none')
trainer = torch.optim.SGD(net.parameters(), lr=0.1)
num_epochs = 10
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

if __name__ == '__main__':
main()

这个我引入了前面函数跑是能跑,自己看吧,结果图看起来差不多

softmax回归简洁实现完整代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
import matplotlib.pyplot as plt
from torch import nn
import torch
from IPython import display
from d2l import torch as d2l

num_inputs = 784
num_outputs = 10

def accuracy(y_hat, y):
if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
y_hat = y_hat.argmax(axis=1)
cmp = y_hat.type(y.dtype) == y
return float(cmp.type(y.dtype).sum())

def evaluate_accuracy(net, data_iter):
if isinstance(net, torch.nn.Module):
net.eval()
metric = Accumulator(2)
with torch.no_grad():
for X, y in data_iter:
metric.add(accuracy(net(X), y), y.numel())
return metric[0] / metric[1]

class Accumulator:
def __init__(self, n):
self.data = [0.0] * n

def add(self, *args):
self.data = [a + float(b) for a, b in zip(self.data, args)]

def reset(self):
self.data = [0.0] * len(self.data)

def __getitem__(self, idx):
return self.data[idx]

class Animator:
def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,
ylim=None, xscale='linear', yscale='linear',
fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,
figsize=(3.5, 2.5)):
if legend is None:
legend = []
d2l.use_svg_display()
self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)
if nrows * ncols == 1:
self.axes = [self.axes, ]
self.config_axes = lambda: d2l.set_axes(
self.axes[0], xlabel, ylabel, xlim, ylim, xscale, yscale, legend)
self.X, self.Y, self.fmts = None, None, fmts

def add(self, x, y):
if not hasattr(y, "__len__"):
y = [y]
n = len(y)
if not hasattr(x, "__len__"):
x = [x] * n
if not self.X:
self.X = [[] for _ in range(n)]
if not self.Y:
self.Y = [[] for _ in range(n)]
for i, (a, b) in enumerate(zip(x, y)):
if a is not None and b is not None:
self.X[i].append(a)
self.Y[i].append(b)
self.axes[0].cla()
for x, y, fmt in zip(self.X, self.Y, self.fmts):
self.axes[0].plot(x, y, fmt)
self.config_axes()
display.display(self.fig)
display.clear_output(wait=True)

def train_epoch_ch3(net, train_iter, loss, updater):
if isinstance(net, torch.nn.Module):
net.train()
metric = Accumulator(3)
for X, y in train_iter:
y_hat = net(X)
l = loss(y_hat, y)
if isinstance(updater, torch.optim.Optimizer):
updater.zero_grad()
l.mean().backward()
updater.step()
else:
l.sum().backward()
updater(X.shape[0])
metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
return metric[0] / metric[2], metric[1] / metric[2]

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):
animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
legend=['train loss', 'train acc', 'test acc'])
for epoch in range(num_epochs):
train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
test_acc = evaluate_accuracy(net, test_iter)
animator.add(epoch + 1, train_metrics + (test_acc,))

train_loss, train_acc = train_metrics
assert train_loss < 0.5
assert 0.7 < train_acc <= 1
assert 0.7 < test_acc <= 1

def predict_ch3(net, test_iter, n=6):
for X, y in test_iter:
break
trues = d2l.get_fashion_mnist_labels(y)
preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))
titles = [true + '\n' + pred for true, pred in zip(trues, preds)]
d2l.show_images(X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])

net = nn.Sequential(nn.Flatten(), nn.Linear(784, 10))

def init_weights(m):
if type(m) == nn.Linear:
nn.init.normal_(m.weight, std=0.01)

def main():
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
net.apply(init_weights);
loss = nn.CrossEntropyLoss(reduction='none')
trainer = torch.optim.SGD(net.parameters(), lr=0.1)
num_epochs = 10
train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)
plt.show(block=True)

if __name__ == '__main__':
main()

初始化模型参数

1
2
3
4
5
6
7
8
# PyTorch不会隐式地调整输入的形状,因此在线性层前通过展平层(flatten)调整网络输入的形状
net = nn.Sequential(nn.Flatten(), nn.Linear(784, 10))

def init_weights(m):
if type(m) == nn.Linear:
nn.init.normal_(m.weight, std=0.01)

net.apply(init_weights)
1
2
# 交叉熵损失函数
loss = nn.CrossEntropyLoss(reduction='none')

优化算法

1
2
# 用学习率 0.1 为 net 的所有可训练参数创建一个 SGD 优化器
trainer = torch.optim.SGD(net.parameters(), lr=0.1) # 随机梯度下降

训练

1
2
num_epochs = 10
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)