PyTorch实战Fashion MNIST:对比CNN架构与BatchNorm层,最佳准确率92.45%
本文通过PyTorch实现Fashion MNIST图像分类,对比了两层和三层卷积网络(CNN)以及BatchNorm层对模型性能的影响。实验表明,三层CNN整体表现优于两层,最佳准确率达92.45%。BatchNorm在充分训练后能提升准确率,但初期无BatchNorm收敛更快。训练轮数需适度,过多可能导致过拟合。最佳方案为CNN-3Conv加BatchNorm训练25轮。文章提供了完整的数据加载、模型定义、训练测试代码及结果分析,结论可扩展至更复杂的图像分类任务。
Fashion MNIST是一个经典的图像分类数据集,由Zalando Research发布,旨在替代传统的手写数字MNIST数据集。该数据集包含10个类别的时装图像,每张图像为28x28的灰度图,训练集包含60000张图像,测试集包含10000张图像。本文将详细介绍如何使用PyTorch实现Fashion MNIST图像分类,并对比不同CNN架构和BatchNorm层对模型性能的影响。
数据集简介
Fashion MNIST包含以下10个类别:
| 标签 | 类别 |
|---|---|
| 0 | T-shirt/top |
| 1 | Trouser |
| 2 | Pullover |
| 3 | Dress |
| 4 | Coat |
| 5 | Sandal |
| 6 | Shirt |
| 7 | Sneaker |
| 8 | Bag |
| 9 | Ankle boot |
环境配置
首先,确保已安装PyTorch和torchvision:
pip install torch torchvision matplotlib
数据加载与预处理
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
from torchvision import datasets, transforms
# 超参数设置
batch_size = 100
learning_rate = 0.001
# 数据预处理:转换为张量并归一化
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
# 下载并加载数据集
train_dataset = datasets.FashionMNIST(
root="./data", train=True, transform=transform, download=True
)
test_dataset = datasets.FashionMNIST(
root="./data", train=False, transform=transform, download=True
)
train_loader = data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
模型架构
两层卷积网络(CNN-2Conv)
该模型包含两个卷积块,每个卷积块由卷积层、BatchNorm、ReLU激活和最大池化组成:
class CNN2Conv(nn.Module):
def __init__(self, use_batchnorm=True):
super().__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
nn.BatchNorm2d(32) if use_batchnorm else nn.Identity(),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.layer2 = nn.Sequential(
nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
nn.BatchNorm2d(64) if use_batchnorm else nn.Identity(),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.dropout = nn.Dropout()
self.fc1 = nn.Linear(7 * 7 * 64, 1000)
self.fc2 = nn.Linear(1000, 10)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = out.reshape(out.size(0), -1)
out = self.dropout(out)
out = self.fc1(out)
out = self.fc2(out)
return out
三层卷积网络(CNN-3Conv)
增加第三个卷积块,使用3x3卷积核提取更丰富的特征:
class CNN3Conv(nn.Module):
def __init__(self, use_batchnorm=True):
super().__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(32) if use_batchnorm else nn.Identity(),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.layer2 = nn.Sequential(
nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(64) if use_batchnorm else nn.Identity(),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.layer3 = nn.Sequential(
nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(128) if use_batchnorm else nn.Identity(),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.dropout = nn.Dropout()
self.fc1 = nn.Linear(3 * 3 * 128, 512)
self.fc2 = nn.Linear(512, 10)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = self.layer3(out)
out = out.reshape(out.size(0), -1)
out = self.dropout(out)
out = self.fc1(out)
out = self.fc2(out)
return out
训练函数
def train_model(model, train_loader, num_epochs, learning_rate):
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range(num_epochs):
model.train()
running_loss = 0.0
for i, (images, labels) in enumerate(train_loader):
images = images.to(device)
labels = labels.to(device)
output = model(images)
loss = criterion(output, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss += loss.item()
if (i + 1) % 100 == 0:
print(f"Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {loss.item():.4f}")
avg_loss = running_loss / len(train_loader)
print(f"Epoch [{epoch+1}/{num_epochs}], Average Loss: {avg_loss:.4f}")
return model
测试函数
def test_model(model, test_loader):
model.eval()
correct = 0
total = 0
with torch.no_grad():
for images, labels in test_loader:
images = images.to(device)
labels = labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
accuracy = 100 * correct / total
print(f"Accuracy on test set: {accuracy:.2f}%")
return accuracy
实验结果对比
经过多轮实验,我们得到了以下准确率对比结果:
CNN-2Conv(两层卷积)
| 轮数 | 无BatchNorm | 有BatchNorm |
|---|---|---|
| 5 | 90.60% | 89.76% |
| 10 | 90.94% | 91.38% |
| 15 | 91.31% | 91.95% |
| 20 | 91.14% | 91.79% |
CNN-3Conv(三层卷积)
| 轮数 | 无BatchNorm | 有BatchNorm |
|---|---|---|
| 5 | 91.11% | 90.65% |
| 10 | 91.73% | 91.64% |
| 15 | 92.07% | 91.92% |
| 20 | 91.79% | 92.05% |
| 25 | 91.96% | 92.45% |
| 30 | 92.25% | 92.34% |
| 35 | 91.91% | 92.29% |
结果分析
模型深度的影响
三层卷积网络(CNN-3Conv)整体表现优于两层卷积网络(CNN-2Conv),最佳准确率达到92.45%,而两层卷积网络的最佳准确率为91.95%。这表明增加网络深度有助于提取更丰富的特征。
BatchNorm的作用
- 在训练初期,无BatchNorm的模型收敛更快
- 但经过充分训练后,有BatchNorm的模型往往能达到更高的准确率
- 对于CNN-2Conv,最佳结果(91.95%)出现在有BatchNorm的情况下
- 对于CNN-3Conv,最佳结果(92.45%)同样出现在有BatchNorm的情况下
训练轮数的影响
- 随着训练轮数增加,准确率整体呈上升趋势
- 但训练过多轮数(如35轮)可能导致轻微过拟合
- 对于CNN-2Conv,15轮是最佳选择
- 对于CNN-3Conv,25轮是最佳选择
完整训练示例
# 训练CNN-3Conv模型(带BatchNorm)
model = CNN3Conv(use_batchnorm=True)
model = train_model(model, train_loader, num_epochs=25, learning_rate=0.001)
test_model(model, test_loader)
# 保存模型
torch.save(model.state_dict(), 'fashion_mnist_cnn.pth')
模型预测
# 加载保存的模型
model = CNN3Conv(use_batchnorm=True)
model.load_state_dict(torch.load('fashion_mnist_cnn.pth'))
model = model.to(device)
model.eval()
# 类别标签
classes = [
"T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"
]
# 对测试集第一张图像进行预测
image, label = test_dataset[0]
with torch.no_grad():
image = image.unsqueeze(0).to(device)
output = model(image)
_, predicted = torch.max(output, 1)
print(f"Predicted: {classes[predicted[0]]}, Actual: {classes[label]}")
总结
本文使用PyTorch实现了Fashion MNIST时装分类任务,对比了不同CNN架构和BatchNorm层对模型性能的影响。主要结论如下:
- 最佳模型:三层卷积网络(CNN-3Conv)+ BatchNorm,训练25轮
- 最佳准确率:92.45%
- 关键发现:适当的网络深度和BatchNorm层能有效提升模型性能
Fashion MNIST虽然是一个相对简单的数据集,但本文的方法和结论可以扩展到更复杂的图像分类任务中。希望这篇博客能帮助你更好地理解CNN在图像分类中的应用!
最后更新于2小时前
本文由人工编写,AI优化,转载请注明原文地址: PyTorch实战Fashion MNIST:对比CNN架构与BatchNorm层,最佳准确率92.45%
推荐阅读
OpenVPN安装配置完整指南:从零搭建安全VPN服务器与客户端
29932024-06-21
Windows系统PyTorch安装教程:CUDA 12.1环境配置与TorchText版本兼容性指南
30862024-06-21
Claude Mythos Preview称霸AI编程榜:16项全冠,昂贵且危险的性能怪兽
2192026-04-21
达梦数据库libgeos_c.dll加载失败解决方法:空间数据包安装指南
1962026-03-23
从非交互到交互式备案,手把手教你一周内搞定公安安全评估,轻松解锁网站互动功能
2882026-04-11
XWiki只允许本机访问:Jetty绑定127.0.0.1配置方法
1712026-04-28