1 文件结构 1 2 3 4 5 6 ├── cat.jpg ├── data ├── Lenet.pth ├── model.py ├── predict.py └── train.py
cat.jpg : 自己用来测试模型预测的图片
data: 训练和验证数据集目录。
model.py :定义LeNet网络模型
train.py :加载数据集并训练,训练集计算loss,测试集计算accuracy,保存训练好的网络参数
2 LeNet网络模型model.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 import torch.nn as nnimport torch.nn.functional as Fclass LeNet (nn.Module): def __init__ (self ): super (LeNet, self).__init__() self.conv1 = nn.Conv2d(3 , 16 , 5 ) self.pool1 = nn.MaxPool2d(2 , 2 ) self.conv2 = nn.Conv2d(16 , 32 , 5 ) self.pool2 = nn.MaxPool2d(2 , 2 ) self.fc1 = nn.Linear(32 *5 *5 , 120 ) self.fc2 = nn.Linear(120 , 84 ) self.fc3 = nn.Linear(84 , 10 ) def forward (self, x ): x = F.relu(self.conv1(x)) x = self.pool1(x) x = F.relu(self.conv2(x)) x = self.pool2(x) x = x.view(-1 , 32 *5 *5 ) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x myLeNet = LeNet() print (myLeNet)
pytorch 中 tensor(也就是输入输出层)的 通道排序为:[batch, channel, height, width]
2.1 创建网络 要在 PyTorch 中定义一个神经网络,我们需要创建一个继承自 nn.Module 的类。我们在__init__
函数中定义网络的层数,并在 forward
函数中指定数据如何通过网络。为了加速神经网络中的操作,我们将其移至 GPU 或 MPS(如果可用)。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 import torch.nn as nnimport torch.nn.functional as Fclass Model (nn.Module): def __init__ (self ): super ().__init__() self.conv1 = nn.Conv2d(1 , 20 , 5 ) self.conv2 = nn.Conv2d(20 , 20 , 5 ) def forward (self, x ): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x)) model1 = Model() print (model1)class Model2 (nn.Module): def __init__ (self ): super ().__init__() self.SeqModel = nn.Sequential( nn.Conv2d(1 , 20 , 5 ), nn.ReLU(), nn.Conv2d(20 , 20 , 5 ), nn.ReLU() ) def forward (self, x ): x = self.SeqModel(x) return x model2 = Model2() print (model2)
1 2 3 4 5 6 7 8 9 10 11 12 Model( (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) (conv2): Conv2d(20, 20, kernel_size=(5, 5), stride=(1, 1)) ) Model2( (SeqModel): Sequential( (0): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) (1): ReLU() (2): Conv2d(20, 20, kernel_size=(5, 5), stride=(1, 1)) (3): ReLU() ) )
是在 forward
方法中应用于 self.conv1
和 self.conv2
的输出。然而,当你打印模型时,它只显示模型的结构而不会显示每个层的具体操作,因此你不会在打印输出中看到 F.relu
1 2 class torch .nn.Sequential(arg: OrderedDict[str , Module])
一个顺序容器。模块将按照它们在构造函数中传递的顺序添加到其中。或者,可以传入模块的 OrderedDict。Sequential 的forward() 方法接受任何输入并将其转发到它包含的第一个模块。然后,它将输出按顺序“链接”到每个后续模块的输入,最后返回最后一个模块的输出。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 model = nn.Sequential( nn.Conv2d(1 ,20 ,5 ), nn.ReLU(), nn.Conv2d(20 ,64 ,5 ), nn.ReLU() ) model = nn.Sequential(OrderedDict([ ('conv1' , nn.Conv2d(1 ,20 ,5 )), ('relu1' , nn.ReLU()), ('conv2' , nn.Conv2d(20 ,64 ,5 )), ('relu2' , nn.ReLU()) ]))
2.1 卷积 Conv2d 我们常用的卷积(Conv2d)在pytorch中对应的函数是:
1 torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1 , padding=0 , dilation=1 , groups=1 , bias=True , padding_mode='zeros' )
kernel_size:卷积核的尺寸。可以是int类型,如3 代表卷积核的height=width=3,也可以是tuple类型如(3, 5)代表卷积核的height=3,width=5
padding:补零操作,默认为0。可以为int型如1即补一圈0,如果输入为tuple型如(2, 1) 代表在上下补2行,左右补1列。
$Output = {\frac{( W − F + 2 P ) }{S}} + 1$
输入图片大小 W×W
padding的像素数 P
2.2 池化 MaxPool2d 最大池化(MaxPool2d)在 pytorch 中对应的函数是:
1 MaxPool2d(kernel_size, stride)
2.3 Tensor的展平:view() 注意到,在经过第二个池化层后,数据还是一个三维的Tensor (32, 5, 5),需要先经过展平后(325 5)再传到全连接层:
2.4 全连接 Linear 全连接( Linear)在 pytorch 中对应的函数是:
1 Linear(in_features, out_features, bias=True )
3 模型训练 tran.py 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 import torchimport torchvisionimport torch.nn as nnfrom model import LeNetimport torch.optim as optimimport torchvision.transforms as transformsimport timedef main (): transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5 , 0.5 , 0.5 ), (0.5 , 0.5 , 0.5 ))]) train_set = torchvision.datasets.CIFAR10(root='./data' , train=True , download=True , transform=transform) train_loader = torch.utils.data.DataLoader(train_set, batch_size=100 , shuffle=True , num_workers=0 ) val_set = torchvision.datasets.CIFAR10(root='./data' , train=False , download=True , transform=transform) val_loader = torch.utils.data.DataLoader(val_set, batch_size=5000 , shuffle=False , num_workers=0 ) val_data_iter = iter (val_loader) val_image, val_label = next (val_data_iter) device = torch.device("cuda" if torch.cuda.is_available() else "cpu" ) print (device) net = LeNet() if torch.cuda.is_available(): net.to(device) loss_function = nn.CrossEntropyLoss() optimizer = optim.Adam(net.parameters(), lr=0.001 ) for epoch in range (5 ): time_start = time.perf_counter() running_loss = 0.0 for step, data in enumerate (train_loader, start=0 ): inputs, labels = data optimizer.zero_grad() if torch.cuda.is_available(): outputs = net(inputs.to(device)) loss = loss_function(outputs, labels.to(device)) else : outputs = net(inputs) loss = loss_function(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() if step % 500 == 499 : with torch.no_grad(): if torch.cuda.is_available(): outputs = net(val_image.to(device)) predict_y = torch.max (outputs, dim=1 )[1 ] accuracy = torch.eq(predict_y, val_label.to(device)).sum ().item() / val_label.size(0 ) else : outputs = net(val_image) predict_y = torch.max (outputs, dim=1 )[1 ] accuracy = torch.eq(predict_y, val_label).sum ().item() / val_label.size(0 ) print ('[%d, %5d] train_loss: %.3f test_accuracy: %.3f' % (epoch + 1 , step + 1 , running_loss / 500 , accuracy)) print ('%f s' % (time.perf_counter() - time_start)) running_loss = 0.0 print ('Finished Training' ) save_path = './Lenet.pth' torch.save(net.state_dict(), save_path) if __name__ == '__main__' : main()
3.1 数据预处理 对输入的图像数据做预处理,即由shape (H x W x C) in the range [0, 255] → shape (C x H x W) in the range [0.0, 1.0]
1 2 3 4 transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5 , 0.5 , 0.5 ), (0.5 , 0.5 , 0.5 ))])
3.2 数据集 利用torchvision.datasets
此demo用的是CIFAR10数据集,也是一个很经典的图像分类数据集,由 Hinton 的学生 Alex Krizhevsky 和 Ilya Sutskever 整理的一个用于识别普适物体的小型数据集,一共包含 10 个类别的 RGB 彩色图片。
3.2.1 导入、加载 训练集 1 2 3 4 5 6 train_set = torchvision.datasets.CIFAR10(root='./data' , train=True , download=True , transform=transform) train_loader = torch.utils.data.DataLoader(train_set, batch_size=100 , shuffle=True , num_workers=0 )
3.2.2 导入、加载 测试集 1 2 3 4 5 6 val_set = torchvision.datasets.CIFAR10(root='./data' , train=False , download=True , transform=transform) val_loader = torch.utils.data.DataLoader(val_set, batch_size=5000 , shuffle=False , num_workers=0 )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ├── cat.jpg ├── data │ ├── cifar-10-batches-py │ │ ├── batches.meta │ │ ├── data_batch_1 │ │ ├── data_batch_2 │ │ ├── data_batch_3 │ │ ├── data_batch_4 │ │ ├── data_batch_5 │ │ ├── readme.html │ │ └── test_batch │ └── cifar-10-python.tar.gz ├── Lenet.pth ├── model.py ├── predict.py ├── __pycache__ │ └── model.cpython-310.pyc └── train.py
在CIFAR-10 数据集中,文件databatch_1.bin、data_batch_2.bin 、··data_batch_5.bin 和test batch.bin 中各有10000 个样本。一个样本由3073 个字节组成,第一个字节为标签label ,剩下3072 个字节为图像数据。样本和样本之间没高多余的字节分割, 因此这几个二进制文件的大小都是30730000 字节。
3.3 训练过程
对训练集的全部数据进行一次完整的训练,称为 一次 epoch
由于硬件算力有限,实际训练时将训练集分成多个批次训练,每批数据的大小为 batch_size
iteration 或 step
对一个batch的数据训练的过程称为 一个 iteration 或 step
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 val_data_iter = iter (val_loader) val_image, val_label = next (val_data_iter) device = torch.device("cuda" if torch.cuda.is_available() else "cpu" ) print (device)net = LeNet() if torch.cuda.is_available(): net.to(device) loss_function = nn.CrossEntropyLoss() optimizer = optim.Adam(net.parameters(), lr=0.001 ) for epoch in range (5 ): time_start = time.perf_counter() running_loss = 0.0 for step, data in enumerate (train_loader, start=0 ): inputs, labels = data optimizer.zero_grad() if torch.cuda.is_available(): outputs = net(inputs.to(device)) loss = loss_function(outputs, labels.to(device)) else : outputs = net(inputs) loss = loss_function(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() if step % 500 == 499 : with torch.no_grad(): if torch.cuda.is_available(): outputs = net(val_image.to(device)) predict_y = torch.max (outputs, dim=1 )[1 ] accuracy = torch.eq(predict_y, val_label.to(device)).sum ().item() / val_label.size(0 ) else : outputs = net(val_image) predict_y = torch.max (outputs, dim=1 )[1 ] accuracy = torch.eq(predict_y, val_label).sum ().item() / val_label.size(0 ) print ('[%d, %5d] train_loss: %.3f test_accuracy: %.3f' % (epoch + 1 , step + 1 , running_loss / 500 , accuracy)) print ('%f s' % (time.perf_counter() - time_start)) running_loss = 0.0 print ('Finished Training' )save_path = './Lenet.pth' torch.save(net.state_dict(), save_path)
3.3.1 使用GPU/CPU训练 使用下面语句可以在有GPU时使用GPU,无GPU时使用CPU进行训练
1 2 device = torch.device("cuda" if torch.cuda.is_available() else "cpu" ) print (device)
1 2 3 device = torch.device("cuda" )
4 使用训练的模型预测predict.py 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 import torchimport torchvision.transforms as transformsfrom PIL import Imagefrom model import LeNetdef main (): transform = transforms.Compose( [transforms.Resize((32 , 32 )), transforms.ToTensor(), transforms.Normalize((0.5 , 0.5 , 0.5 ), (0.5 , 0.5 , 0.5 ))]) classes = ('plane' , 'car' , 'bird' , 'cat' , 'deer' , 'dog' , 'frog' , 'horse' , 'ship' , 'truck' ) device = torch.device("cuda" if torch.cuda.is_available() else "cpu" ) print (device) net = LeNet() net.load_state_dict(torch.load('Lenet.pth' )) if torch.cuda.is_available(): net.to(device) im = Image.open ('cat.jpg' ) im = transform(im) im = torch.unsqueeze(im, dim=0 ) with torch.no_grad(): if torch.cuda.is_available(): outputs = net(im.to(device)) predict = torch.max (outputs.cpu(), dim=1 )[1 ].numpy() else : outputs = net(im) predict = torch.max (outputs, dim=1 )[1 ].numpy() print (outputs) print (classes[int (predict)]) if __name__ == '__main__' : main()