PyTorch猫狗大战:CNN vs VGG16迁移学习,谁更胜一筹?

2025-02-06 人工智能 204 次阅读 0 次点赞
本文介绍了使用PyTorch实现猫狗图像分类的两种方法:从头构建CNN网络和采用VGG16迁移学习。CNN模型需要更多训练轮次,在20轮时达到最高准确率79.28%,之后出现过拟合。而VGG16仅用5轮就能达到86.5%的准确率,10轮后稳定在86.98%,明显优于CNN。实验表明,在训练数据有限的情况下,利用预训练模型进行迁移学习可以显著提升训练效率和分类准确率,是更优的选择。

猫狗分类是计算机视觉领域的经典入门项目,来自Kaggle上的Dogs vs. Cats竞赛。本文将使用PyTorch实现两种方法:从头构建的CNN网络和使用迁移学习的VGG16网络,并对比它们的性能表现。

环境准备

pip install torch torchvision Pillow

数据集处理

数据集包含25000张猫和狗的图像,我们将前20000张用于训练,后5000张用于测试。

自定义Dataset类

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from PIL import Image
import os

# 固定训练集和测试集划分
train_size = 25000
indices = torch.randperm(train_size)
train_indices = indices[:20000]
test_indices = indices[20000:]

class DogsVsCatsDataset(Dataset):
    def __init__(self, root, train=True, transform=None):
        super().__init__()
        self.root = root
        self.transform = transform
        self.files = []
        self.labels = []

        files = os.listdir(root)
        index = train_indices if train else test_indices
        
        for i in index:
            file = files[i]
            self.files.append(file)
            # 文件名包含"dog"则标签为0,否则为1(cat)
            self.labels.append(0 if "dog" in file else 1)

    def __len__(self):
        return len(self.files)

    def __getitem__(self, index):
        path = os.path.join(self.root, self.files[index])
        image = Image.open(path).convert("RGB")
        label = self.labels[index]
        if self.transform:
            image = self.transform(image)
        return image, label

数据增强与预处理

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.RandomRotation(degrees=30),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, 
                          saturation=0.2, hue=0.1),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), 
                        std=(0.229, 0.224, 0.225)),
])

train_dataset = DogsVsCatsDataset(
    root="./data/Dogs Vs Cats/train", 
    train=True, 
    transform=transform
)
test_dataset = DogsVsCatsDataset(
    root="./data/Dogs Vs Cats/train", 
    train=False, 
    transform=transform
)

train_loader = DataLoader(train_dataset, batch_size=50, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=50, shuffle=False)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

方法一:从头构建CNN

模型架构

我们构建一个4层卷积神经网络:

class CNNModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.cnn1 = nn.Sequential(
            nn.Conv2d(3, 24, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(24),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.cnn2 = nn.Sequential(
            nn.Conv2d(24, 48, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(48),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.cnn3 = nn.Sequential(
            nn.Conv2d(48, 96, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(96),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.cnn4 = nn.Sequential(
            nn.Conv2d(96, 48, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(48),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.dropout = nn.Dropout()
        self.line1 = nn.Linear(14 * 14 * 48, 512)
        self.line2 = nn.Linear(512, 2)

    def forward(self, x):
        out = self.cnn1(x)
        out = self.cnn2(out)
        out = self.cnn3(out)
        out = self.cnn4(out)
        out = out.reshape(out.size(0), -1)
        out = self.dropout(out)
        out = self.line1(out)
        out = self.line2(out)
        return out

训练CNN模型

num_epochs = 20
learning_rate = 0.001

model = CNNModel().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

model.train()
for epoch in range(num_epochs):
    for i, (image, label) in enumerate(train_loader):
        image = image.to(device)
        label = label.to(device)

        output = model(image)
        loss = criterion(output, label)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i + 1) % 50 == 0:
            print(f"Epoch [{epoch+1}/{num_epochs}], "
                  f"Step [{i+1}/{len(train_loader)}], "
                  f"Loss: {loss.item():.4f}")

# 测试
model.eval()
with torch.no_grad():
    total = 0
    correct = 0
    for image, label in test_loader:
        image = image.to(device)
        label = label.to(device)
        output = model(image)
        _, predict = torch.max(output, 1)
        total += len(label)
        correct += (predict == label).sum().item()
    
    print(f"Accuracy: {correct / total * 100:.2f}%")

方法二:使用VGG16迁移学习

迁移学习原理

VGG16在ImageNet上预训练,已经学到了丰富的图像特征。我们冻结卷积层,只训练分类器部分。

import torchvision.models as models

num_epochs = 10
batch_size = 10
learning_rate = 0.001

# 加载预训练模型
model = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1)

# 冻结卷积层参数
for param in model.features.parameters():
    param.requires_grad = False

# 修改分类器输出为2类
model.classifier[6].out_features = 2

model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# 训练循环(与CNN类似)
# ...

实验结果对比

方法 轮数 准确度
CNN 5 67.64%
CNN 10 74.92%
CNN 15 73.42%
CNN 20 79.28%
CNN 25 78.28%
VGG16 5 86.50%
VGG16 10 86.98%
VGG16 15 85.42%

结果分析

CNN模型:随着训练轮数增加,准确率在20轮达到峰值79.28%,之后出现过拟合。

VGG16模型:仅5轮就达到86.5%的准确率,远超CNN的20轮结果。迁移学习显著提升了训练效率和最终性能。

结论:当训练数据有限时,使用预训练模型进行迁移学习是最佳选择。

总结

本文使用PyTorch实现了猫狗分类的两种方法:

  • 自定义CNN:从零学习特征,需要更多轮次和数据
  • VGG16迁移学习:利用预训练知识,快速达到高准确率

建议在实际项目中优先考虑迁移学习方法,尤其是当数据集规模有限时。


数据集下载:https://www.kaggle.com/competitions/dogs-vs-cats/data

最后更新于1小时前
本文由人工编写,AI优化,转载请注明原文地址: PyTorch猫狗大战:CNN vs VGG16迁移学习,谁更胜一筹?

评论 (0)

登录 后发表评论

暂无评论,快来发表第一条评论吧!