PyTorch猫狗大战:CNN vs VGG16迁移学习,谁更胜一筹?
本文介绍了使用PyTorch实现猫狗图像分类的两种方法:从头构建CNN网络和采用VGG16迁移学习。CNN模型需要更多训练轮次,在20轮时达到最高准确率79.28%,之后出现过拟合。而VGG16仅用5轮就能达到86.5%的准确率,10轮后稳定在86.98%,明显优于CNN。实验表明,在训练数据有限的情况下,利用预训练模型进行迁移学习可以显著提升训练效率和分类准确率,是更优的选择。
猫狗分类是计算机视觉领域的经典入门项目,来自Kaggle上的Dogs vs. Cats竞赛。本文将使用PyTorch实现两种方法:从头构建的CNN网络和使用迁移学习的VGG16网络,并对比它们的性能表现。
环境准备
pip install torch torchvision Pillow
数据集处理
数据集包含25000张猫和狗的图像,我们将前20000张用于训练,后5000张用于测试。
自定义Dataset类
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from PIL import Image
import os
# 固定训练集和测试集划分
train_size = 25000
indices = torch.randperm(train_size)
train_indices = indices[:20000]
test_indices = indices[20000:]
class DogsVsCatsDataset(Dataset):
def __init__(self, root, train=True, transform=None):
super().__init__()
self.root = root
self.transform = transform
self.files = []
self.labels = []
files = os.listdir(root)
index = train_indices if train else test_indices
for i in index:
file = files[i]
self.files.append(file)
# 文件名包含"dog"则标签为0,否则为1(cat)
self.labels.append(0 if "dog" in file else 1)
def __len__(self):
return len(self.files)
def __getitem__(self, index):
path = os.path.join(self.root, self.files[index])
image = Image.open(path).convert("RGB")
label = self.labels[index]
if self.transform:
image = self.transform(image)
return image, label
数据增强与预处理
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.RandomRotation(degrees=30),
transforms.ColorJitter(brightness=0.2, contrast=0.2,
saturation=0.2, hue=0.1),
transforms.ToTensor(),
transforms.Normalize(mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225)),
])
train_dataset = DogsVsCatsDataset(
root="./data/Dogs Vs Cats/train",
train=True,
transform=transform
)
test_dataset = DogsVsCatsDataset(
root="./data/Dogs Vs Cats/train",
train=False,
transform=transform
)
train_loader = DataLoader(train_dataset, batch_size=50, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=50, shuffle=False)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
方法一:从头构建CNN
模型架构
我们构建一个4层卷积神经网络:
class CNNModel(nn.Module):
def __init__(self):
super().__init__()
self.cnn1 = nn.Sequential(
nn.Conv2d(3, 24, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(24),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.cnn2 = nn.Sequential(
nn.Conv2d(24, 48, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(48),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.cnn3 = nn.Sequential(
nn.Conv2d(48, 96, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(96),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.cnn4 = nn.Sequential(
nn.Conv2d(96, 48, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(48),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.dropout = nn.Dropout()
self.line1 = nn.Linear(14 * 14 * 48, 512)
self.line2 = nn.Linear(512, 2)
def forward(self, x):
out = self.cnn1(x)
out = self.cnn2(out)
out = self.cnn3(out)
out = self.cnn4(out)
out = out.reshape(out.size(0), -1)
out = self.dropout(out)
out = self.line1(out)
out = self.line2(out)
return out
训练CNN模型
num_epochs = 20
learning_rate = 0.001
model = CNNModel().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
model.train()
for epoch in range(num_epochs):
for i, (image, label) in enumerate(train_loader):
image = image.to(device)
label = label.to(device)
output = model(image)
loss = criterion(output, label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (i + 1) % 50 == 0:
print(f"Epoch [{epoch+1}/{num_epochs}], "
f"Step [{i+1}/{len(train_loader)}], "
f"Loss: {loss.item():.4f}")
# 测试
model.eval()
with torch.no_grad():
total = 0
correct = 0
for image, label in test_loader:
image = image.to(device)
label = label.to(device)
output = model(image)
_, predict = torch.max(output, 1)
total += len(label)
correct += (predict == label).sum().item()
print(f"Accuracy: {correct / total * 100:.2f}%")
方法二:使用VGG16迁移学习
迁移学习原理
VGG16在ImageNet上预训练,已经学到了丰富的图像特征。我们冻结卷积层,只训练分类器部分。
import torchvision.models as models
num_epochs = 10
batch_size = 10
learning_rate = 0.001
# 加载预训练模型
model = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1)
# 冻结卷积层参数
for param in model.features.parameters():
param.requires_grad = False
# 修改分类器输出为2类
model.classifier[6].out_features = 2
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# 训练循环(与CNN类似)
# ...
实验结果对比
| 方法 | 轮数 | 准确度 |
|---|---|---|
| CNN | 5 | 67.64% |
| CNN | 10 | 74.92% |
| CNN | 15 | 73.42% |
| CNN | 20 | 79.28% |
| CNN | 25 | 78.28% |
| VGG16 | 5 | 86.50% |
| VGG16 | 10 | 86.98% |
| VGG16 | 15 | 85.42% |
结果分析
CNN模型:随着训练轮数增加,准确率在20轮达到峰值79.28%,之后出现过拟合。
VGG16模型:仅5轮就达到86.5%的准确率,远超CNN的20轮结果。迁移学习显著提升了训练效率和最终性能。
结论:当训练数据有限时,使用预训练模型进行迁移学习是最佳选择。
总结
本文使用PyTorch实现了猫狗分类的两种方法:
- 自定义CNN:从零学习特征,需要更多轮次和数据
- VGG16迁移学习:利用预训练知识,快速达到高准确率
建议在实际项目中优先考虑迁移学习方法,尤其是当数据集规模有限时。
最后更新于1小时前
本文由人工编写,AI优化,转载请注明原文地址: PyTorch猫狗大战:CNN vs VGG16迁移学习,谁更胜一筹?
推荐阅读
从非交互到交互式备案,手把手教你一周内搞定公安安全评估,轻松解锁网站互动功能
2882026-04-11
XWiki只允许本机访问:Jetty绑定127.0.0.1配置方法
1712026-04-28
CodeBuddyIDE与Trae终极对决:谁是最强国产AI编程IDE?最新版本深度横评
28832025-09-25
Claude Mythos Preview称霸AI编程榜:16项全冠,昂贵且危险的性能怪兽
2192026-04-21
GeoServer适配达梦数据库完整教程:从账号创建到图层发布
1802026-04-14
VMware Workstation 16激活码及许可证密钥获取方法
30322024-09-29