I trained a ResNet18 model; and saved it to a .pth file. When I try to load it I get this error, this continues for a couple more lines with the same pattern.
Error loading checkpoint: Error(s) in loading state_dict for ResNet:
size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for layer1.1.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for layer2.0.conv1.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
size mismatch for layer2.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 1, 1]).
size mismatch for layer2.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
This is my code for training the original model:
teacher = models.resnet18(pretrained=True)
num_features = teacher.fc.in_features
teacher.fc = nn.Linear(num_features, 5)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(teacher.parameters(), lr=0.0001, momentum=0.9, weight_decay=0.0001)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)
def train_and_evaluate(model, train_loader, val_loader, criterion, optimizer, num_epochs, lambda_l1, learning_rate):
for epoch in range(num_epochs):
model.train()
for images, labels in train_loader:
# Filter out class 2 samples during training
mask = labels != 2
images, labels = images[mask], labels[mask]
if len(labels) == 0:
continue
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
total_loss = loss
total_loss.backward()
optimizer.step()
# Evaluate on both training and validation sets, excluding class 2
train_loss, train_accuracy = evaluate_model(model, train_loader, criterion)
val_loss, val_accuracy = evaluate_model(model, val_loader, criterion)
print(f"Epoch {epoch+1} - Training Loss: {train_loss:.4f}, Training Accuracy: {train_accuracy:.4%}, Validation Loss: {val_loss:.4f}, Validation Accuracy: {val_accuracy:.4%}")
I know the logic for excluding the second class is a bit messy, but I really need to get this model back because training it took so long.
Also this is how I'm loading the model:
checkpoint = torch.load("model968acc.pth", map_location="cpu")
teacher.load_state_dict(checkpoint, strict=False)
If there is no hope for saving this, what other ways do you recommend saving the model so I won't have trouble loading it?