Fine-Tuning EfficientNet-B0 for Painting Style Classification

PyTorch Computer Vision Image Classification EfficientNet HuggingFace

Updated on Aug 15, 2025

Introduction

I fine-tuned EfficientNet-B0 to classify artworks into 9 painting styles using the Hugging Face dataset keremberke/painting-style-classification.
The aim was to build a fully custom PyTorch training pipeline—covering dataset preparation, augmentation, transfer learning, and evaluation—to understand both what works and what limits accuracy for this task.

👉 Model card: milliyin/painting-style-classification
https://huggingface.co/milliyin/painting-style-classification

Dataset Preparation

The full dataset (not the mini split) was downloaded directly from Hugging Face in ZIP format for train, validation, and test splits. I created a folder structure like:

dataset/
  images/train
  images/validation
  images/test
  jsonl/train.jsonl
  jsonl/validation.jsonl
  jsonl/test.jsonl

Images were extracted, renamed with zero-padded IDs, and assigned numeric labels based on their original folder names (e.g., baroque → 4, renaissance → 5, surrealism → 8).
I also generated .jsonl files containing metadata for each split (image ID, label, split type) and a combined JSONL for all splits.

To work with the data easily, I implemented a custom dataset loader (FolderDataset) that reads these JSONLs and can access splits like dataset['train'].
A second wrapper (PaintingDataset) applied transforms and returned (image, label) pairs for PyTorch.

Data Augmentation

For training, I applied:

Resize to 224×224 (EfficientNet-B0 input size)
Random horizontal flip (50% probability)
Random rotation up to 15°
Color jitter (brightness, contrast, saturation, hue)
Random affine translation
Normalization to ImageNet stats

For validation and test, only resizing and normalization were applied.

This augmentation strategy was designed to help the model generalize from ~4k images without overfitting.

Model Architecture

I started from torchvision.models.efficientnet_b0 with ImageNet (IMAGENET1K_V1) pretrained weights.
The final classifier layer was replaced with:

Dropout (0.2)
Fully-connected layer to 9 output classes (matching dataset styles)

Transfer Learning Strategy

I froze all layers up to ~layer 100 at the start to speed up convergence and avoid catastrophic forgetting.
The plan was to gradually unfreeze:

Epoch 10: unfreeze more layers (freeze_until_layer=50)
Epoch 20: unfreeze all layers for full fine-tuning

This step-wise unfreezing allowed the classifier head to adapt first before updating earlier convolutional blocks.

Training Setup

Loss Function: CrossEntropyLoss with label smoothing (label_smoothing=0.1)
Optimizer: AdamW (lr=1e-4, weight_decay=0.01)
Scheduler: ReduceLROnPlateau (monitors validation accuracy, reduces LR by factor of 0.5 after 5 epochs without improvement)
Batch Size: 32
Epochs: 50
Device: CUDA

The training loop tracked train loss/accuracy and validation loss/accuracy each epoch.
If the validation accuracy improved, the model was saved as best_efficientnet_b0.pth.

Evaluation & Results

The best validation accuracy achieved was:

✅ 60.15% after 50 epochs

I also generated a classification report and plotted loss/accuracy curves to analyze overfitting patterns.
Inference was tested on individual images with top-1 predicted style and confidence score.

Why Did It Plateau Around ~60%?

High Inter-Class Similarity – Certain styles (e.g., Romanticism vs. Realism) share strong visual overlap.
Label Noise – Open datasets may have inconsistent labels.
Data Imbalance – Some styles had fewer samples, causing uneven learning.
Limited Early Unfreezing – Freezing many layers for too long limited domain adaptation from natural photos to paintings.
Moderate Augmentation – Could be stronger to handle variations in scan quality, lighting, and framing.
Model Size – EfficientNet-B0 is compact; larger backbones may better capture fine texture differences.

How to Improve

Earlier & Gradual Unfreezing – Allow backbone adaptation sooner.
Stronger Augmentations – Use RandAugment, CutMix, Mixup, or color-space perturbations.
Class-Balanced Sampling – Reduce bias toward majority classes.
Bigger Backbone – Try EfficientNet-B2/B3, ConvNeXt-Tiny, or ViT models.
Curated Splits – Avoid artist overlap between train/validation to measure generalization accurately.
TTA & Ensembling – Small accuracy gains from combining predictions.

Code Link

You can explore the complete training pipeline, dataset processing, and fine-tuning notebook here:
👉 GitHub Notebook: painting-style-classification-finetune/finetune.ipynb

Benefits of This Project

Custom Dataset Handling – Built a JSONL + folder-based loader for structured control.
End-to-End Pipeline – Covered raw data → augmentation → model training → evaluation → inference.
Transfer Learning Practice – Applied freezing/unfreezing and domain adaptation strategies.
Error Analysis Mindset – Turned a “60% wall” into a checklist of targeted improvements.

Conclusion

This project provided a hands-on look at training image classification models for nuanced visual categories like art styles.
With a solid baseline at ~60% validation accuracy, there’s plenty of room to iterate—particularly on augmentation, layer unfreezing, and backbone scaling—to push well beyond this mark.

Hashtags

#PyTorch #ComputerVision #ImageClassification #EfficientNet #HuggingFace #DeepLearning #TransferLearning #NeuralNetworks #FineTuning #EfficientNetB0 #Torchvision #HuggingFaceDatasets #PaintingClassification #ArtRecognition #VisualRecognition #ImageAugmentation #FeatureExtraction #ComputerVisionProjects #AIArtAnalysis #ModelTraining #DatasetPreparation #ImageNetPretrained #ArtStyles