Fine-Tuning EfficientNet-B0 for Painting Style Classification
Introduction
I fine-tuned EfficientNet-B0 to classify artworks into 9 painting styles using the Hugging Face dataset keremberke/painting-style-classification
.
The aim was to build a fully custom PyTorch training pipeline—covering dataset preparation, augmentation, transfer learning, and evaluation—to understand both what works and what limits accuracy for this task.
👉 Model card: milliyin/painting-style-classification
https://huggingface.co/milliyin/painting-style-classification
Dataset Preparation
The full dataset (not the mini split) was downloaded directly from Hugging Face in ZIP format for train, validation, and test splits. I created a folder structure like:
dataset/
images/train
images/validation
images/test
jsonl/train.jsonl
jsonl/validation.jsonl
jsonl/test.jsonl
Images were extracted, renamed with zero-padded IDs, and assigned numeric labels based on their original folder names (e.g., baroque
→ 4, renaissance
→ 5, surrealism
→ 8).
I also generated .jsonl
files containing metadata for each split (image ID, label, split type) and a combined JSONL for all splits.
To work with the data easily, I implemented a custom dataset loader (FolderDataset
) that reads these JSONLs and can access splits like dataset['train']
.
A second wrapper (PaintingDataset
) applied transforms and returned (image, label)
pairs for PyTorch.
Data Augmentation
For training, I applied:
- Resize to 224×224 (EfficientNet-B0 input size)
- Random horizontal flip (50% probability)
- Random rotation up to 15°
- Color jitter (brightness, contrast, saturation, hue)
- Random affine translation
- Normalization to ImageNet stats
For validation and test, only resizing and normalization were applied.
This augmentation strategy was designed to help the model generalize from ~4k images without overfitting.
Model Architecture
I started from torchvision.models.efficientnet_b0
with ImageNet (IMAGENET1K_V1) pretrained weights.
The final classifier layer was replaced with:
- Dropout (0.2)
- Fully-connected layer to 9 output classes (matching dataset styles)
Transfer Learning Strategy
I froze all layers up to ~layer 100 at the start to speed up convergence and avoid catastrophic forgetting.
The plan was to gradually unfreeze:
- Epoch 10: unfreeze more layers (freeze_until_layer=50)
- Epoch 20: unfreeze all layers for full fine-tuning
This step-wise unfreezing allowed the classifier head to adapt first before updating earlier convolutional blocks.
Training Setup
- Loss Function: CrossEntropyLoss with label smoothing (
label_smoothing=0.1
) - Optimizer: AdamW (
lr=1e-4
,weight_decay=0.01
) - Scheduler: ReduceLROnPlateau (monitors validation accuracy, reduces LR by factor of 0.5 after 5 epochs without improvement)
- Batch Size: 32
- Epochs: 50
- Device: CUDA
The training loop tracked train loss/accuracy and validation loss/accuracy each epoch.
If the validation accuracy improved, the model was saved as best_efficientnet_b0.pth
.
Evaluation & Results
The best validation accuracy achieved was:
✅ 60.15% after 50 epochs
I also generated a classification report and plotted loss/accuracy curves to analyze overfitting patterns.
Inference was tested on individual images with top-1 predicted style and confidence score.
Why Did It Plateau Around ~60%?
- High Inter-Class Similarity – Certain styles (e.g., Romanticism vs. Realism) share strong visual overlap.
- Label Noise – Open datasets may have inconsistent labels.
- Data Imbalance – Some styles had fewer samples, causing uneven learning.
- Limited Early Unfreezing – Freezing many layers for too long limited domain adaptation from natural photos to paintings.
- Moderate Augmentation – Could be stronger to handle variations in scan quality, lighting, and framing.
- Model Size – EfficientNet-B0 is compact; larger backbones may better capture fine texture differences.
How to Improve
- Earlier & Gradual Unfreezing – Allow backbone adaptation sooner.
- Stronger Augmentations – Use RandAugment, CutMix, Mixup, or color-space perturbations.
- Class-Balanced Sampling – Reduce bias toward majority classes.
- Bigger Backbone – Try EfficientNet-B2/B3, ConvNeXt-Tiny, or ViT models.
- Curated Splits – Avoid artist overlap between train/validation to measure generalization accurately.
- TTA & Ensembling – Small accuracy gains from combining predictions.
Code Link
You can explore the complete training pipeline, dataset processing, and fine-tuning notebook here:
👉 GitHub Notebook: painting-style-classification-finetune/finetune.ipynb
Benefits of This Project
- Custom Dataset Handling – Built a JSONL + folder-based loader for structured control.
- End-to-End Pipeline – Covered raw data → augmentation → model training → evaluation → inference.
- Transfer Learning Practice – Applied freezing/unfreezing and domain adaptation strategies.
- Error Analysis Mindset – Turned a “60% wall” into a checklist of targeted improvements.
Conclusion
This project provided a hands-on look at training image classification models for nuanced visual categories like art styles.
With a solid baseline at ~60% validation accuracy, there’s plenty of room to iterate—particularly on augmentation, layer unfreezing, and backbone scaling—to push well beyond this mark.
Hashtags
#PyTorch #ComputerVision #ImageClassification #EfficientNet #HuggingFace #DeepLearning #TransferLearning #NeuralNetworks #FineTuning #EfficientNetB0 #Torchvision #HuggingFaceDatasets #PaintingClassification #ArtRecognition #VisualRecognition #ImageAugmentation #FeatureExtraction #ComputerVisionProjects #AIArtAnalysis #ModelTraining #DatasetPreparation #ImageNetPretrained #ArtStyles