PyTorch实战Fashion MNIST:对比CNN架构与BatchNorm层,最佳准确率92.45%

2024-06-21 人工智能 155 次阅读 0 次点赞
本文通过PyTorch实现Fashion MNIST图像分类,对比了两层和三层卷积网络(CNN)以及BatchNorm层对模型性能的影响。实验表明,三层CNN整体表现优于两层,最佳准确率达92.45%。BatchNorm在充分训练后能提升准确率,但初期无BatchNorm收敛更快。训练轮数需适度,过多可能导致过拟合。最佳方案为CNN-3Conv加BatchNorm训练25轮。文章提供了完整的数据加载、模型定义、训练测试代码及结果分析,结论可扩展至更复杂的图像分类任务。

Fashion MNIST是一个经典的图像分类数据集,由Zalando Research发布,旨在替代传统的手写数字MNIST数据集。该数据集包含10个类别的时装图像,每张图像为28x28的灰度图,训练集包含60000张图像,测试集包含10000张图像。本文将详细介绍如何使用PyTorch实现Fashion MNIST图像分类,并对比不同CNN架构和BatchNorm层对模型性能的影响。

数据集简介

Fashion MNIST包含以下10个类别:

标签 类别
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot

环境配置

首先,确保已安装PyTorch和torchvision:

pip install torch torchvision matplotlib

数据加载与预处理

import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
from torchvision import datasets, transforms

# 超参数设置
batch_size = 100
learning_rate = 0.001

# 数据预处理:转换为张量并归一化
transform = transforms.Compose([
    transforms.ToTensor(), 
    transforms.Normalize((0.5,), (0.5,))
])

# 下载并加载数据集
train_dataset = datasets.FashionMNIST(
    root="./data", train=True, transform=transform, download=True
)
test_dataset = datasets.FashionMNIST(
    root="./data", train=False, transform=transform, download=True
)

train_loader = data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

模型架构

两层卷积网络(CNN-2Conv)

该模型包含两个卷积块,每个卷积块由卷积层、BatchNorm、ReLU激活和最大池化组成:

class CNN2Conv(nn.Module):
    def __init__(self, use_batchnorm=True):
        super().__init__()
        
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(32) if use_batchnorm else nn.Identity(),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(64) if use_batchnorm else nn.Identity(),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.dropout = nn.Dropout()
        self.fc1 = nn.Linear(7 * 7 * 64, 1000)
        self.fc2 = nn.Linear(1000, 10)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.dropout(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

三层卷积网络(CNN-3Conv)

增加第三个卷积块,使用3x3卷积核提取更丰富的特征:

class CNN3Conv(nn.Module):
    def __init__(self, use_batchnorm=True):
        super().__init__()
        
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32) if use_batchnorm else nn.Identity(),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64) if use_batchnorm else nn.Identity(),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128) if use_batchnorm else nn.Identity(),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.dropout = nn.Dropout()
        self.fc1 = nn.Linear(3 * 3 * 128, 512)
        self.fc2 = nn.Linear(512, 10)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = out.reshape(out.size(0), -1)
        out = self.dropout(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

训练函数

def train_model(model, train_loader, num_epochs, learning_rate):
    model = model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        
        for i, (images, labels) in enumerate(train_loader):
            images = images.to(device)
            labels = labels.to(device)
            
            output = model(images)
            loss = criterion(output, labels)
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            
            if (i + 1) % 100 == 0:
                print(f"Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {loss.item():.4f}")
        
        avg_loss = running_loss / len(train_loader)
        print(f"Epoch [{epoch+1}/{num_epochs}], Average Loss: {avg_loss:.4f}")
    
    return model

测试函数

def test_model(model, test_loader):
    model.eval()
    correct = 0
    total = 0
    
    with torch.no_grad():
        for images, labels in test_loader:
            images = images.to(device)
            labels = labels.to(device)
            
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    accuracy = 100 * correct / total
    print(f"Accuracy on test set: {accuracy:.2f}%")
    return accuracy

实验结果对比

经过多轮实验,我们得到了以下准确率对比结果:

CNN-2Conv(两层卷积)

轮数 无BatchNorm 有BatchNorm
5 90.60% 89.76%
10 90.94% 91.38%
15 91.31% 91.95%
20 91.14% 91.79%

CNN-3Conv(三层卷积)

轮数 无BatchNorm 有BatchNorm
5 91.11% 90.65%
10 91.73% 91.64%
15 92.07% 91.92%
20 91.79% 92.05%
25 91.96% 92.45%
30 92.25% 92.34%
35 91.91% 92.29%

结果分析

模型深度的影响

三层卷积网络(CNN-3Conv)整体表现优于两层卷积网络(CNN-2Conv),最佳准确率达到92.45%,而两层卷积网络的最佳准确率为91.95%。这表明增加网络深度有助于提取更丰富的特征。

BatchNorm的作用

  • 在训练初期,无BatchNorm的模型收敛更快
  • 但经过充分训练后,有BatchNorm的模型往往能达到更高的准确率
  • 对于CNN-2Conv,最佳结果(91.95%)出现在有BatchNorm的情况下
  • 对于CNN-3Conv,最佳结果(92.45%)同样出现在有BatchNorm的情况下

训练轮数的影响

  • 随着训练轮数增加,准确率整体呈上升趋势
  • 但训练过多轮数(如35轮)可能导致轻微过拟合
  • 对于CNN-2Conv,15轮是最佳选择
  • 对于CNN-3Conv,25轮是最佳选择

完整训练示例

# 训练CNN-3Conv模型(带BatchNorm)
model = CNN3Conv(use_batchnorm=True)
model = train_model(model, train_loader, num_epochs=25, learning_rate=0.001)
test_model(model, test_loader)

# 保存模型
torch.save(model.state_dict(), 'fashion_mnist_cnn.pth')

模型预测

# 加载保存的模型
model = CNN3Conv(use_batchnorm=True)
model.load_state_dict(torch.load('fashion_mnist_cnn.pth'))
model = model.to(device)
model.eval()

# 类别标签
classes = [
    "T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
    "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"
]

# 对测试集第一张图像进行预测
image, label = test_dataset[0]
with torch.no_grad():
    image = image.unsqueeze(0).to(device)
    output = model(image)
    _, predicted = torch.max(output, 1)
    print(f"Predicted: {classes[predicted[0]]}, Actual: {classes[label]}")

总结

本文使用PyTorch实现了Fashion MNIST时装分类任务,对比了不同CNN架构和BatchNorm层对模型性能的影响。主要结论如下:

  • 最佳模型:三层卷积网络(CNN-3Conv)+ BatchNorm,训练25轮
  • 最佳准确率:92.45%
  • 关键发现:适当的网络深度和BatchNorm层能有效提升模型性能

Fashion MNIST虽然是一个相对简单的数据集,但本文的方法和结论可以扩展到更复杂的图像分类任务中。希望这篇博客能帮助你更好地理解CNN在图像分类中的应用!

最后更新于2小时前
本文由人工编写,AI优化,转载请注明原文地址: PyTorch实战Fashion MNIST:对比CNN架构与BatchNorm层,最佳准确率92.45%

评论 (0)

登录 后发表评论

暂无评论,快来发表第一条评论吧!