Transfer Learning for Multi-Class Scene Classification
Fine-tuned pre-trained models (ResNet, EfficientNet, VGG) for 6-class scene recognition with data augmentation in TensorFlow/Keras.
Overview
This project applies transfer learning to classify images into 6 scene categories (buildings, forest, glacier, mountain, sea, street) using pre-trained models like ResNet50/101, EfficientNetB0, and VGG16 on a dataset of ~14,000 images.
Implemented in TensorFlow/Keras, it incorporates data augmentation, early stopping, and multi-class metrics, with VGG16 achieving the highest F1-score of 0.886. The work demonstrates efficient handling of small datasets through feature extraction and fine-tuning.
Dataset
The Intel Image Classification dataset contains ~17,000 RGB images (150x150 pixels) across 6 natural scene classes,
sourced from various global locations. Split into training (14,034 images) and test (3,000 images) sets, with class
distribution as follows: buildings (2,627), forest (2,745), glacier (2,957), mountain (3,037), sea (2,784), street (
2,883). Images were resized to 224x224 and augmented for robustness.
For more details, refer
to dataset documentation.
Core Challenge: Managing class imbalance and varying scene complexities through transfer learning and augmentation.
Methodology
  The pipeline starts with data preprocessing and augmentation, followed by model creation using frozen pre-trained bases, and ends with training/evaluation.
Stage 1: Data Preprocessing and Augmentation
Images are augmented to improve robustness:
train_datagen = ImageDataGenerator(
    rescale=1.0 / 255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode="nearest",
    validation_split=0.2
)
train_generator = train_datagen.flow_from_directory(
    "data/seg_train",
    target_size=(224, 224),
    batch_size=32,
    class_mode="categorical",
    subset="training"
)
- Validation uses 20% split; test generator is unaugmented for fair evaluation.
 
Stage 2: Model Creation with Transfer Learning
Pre-trained models are adapted by freezing base layers and adding custom heads:
def create_model(base_model_class, learning_rate=1e-4):
    base_model = base_model_class(weights="imagenet", include_top=False, input_shape=(224, 224, 3))
    for layer in base_model.layers:
        layer.trainable = False
    model = Sequential([
        base_model,
        GlobalAveragePooling2D(),
        BatchNormalization(),
        Dropout(0.2),
        Dense(256, activation="relu", kernel_regularizer=l2(0.001)),
        Dense(6, activation="softmax")
    ])
    model.compile(
        optimizer=Adam(learning_rate=learning_rate),
        loss="categorical_crossentropy",
        metrics=["accuracy"]
    )
    return model
Stage 3: Training and Evaluation
Models train for 50 epochs with callbacks:
def get_callbacks(model_name):
    return [
        ReduceLROnPlateau(monitor='val_loss', factor=0.3, patience=5, min_lr=1e-6),
        EarlyStopping(monitor='val_loss', patience=7, restore_best_weights=True),
        ModelCheckpoint(filepath=f"{model_name}_best_model.keras", monitor='val_loss', save_best_only=True)
    ]
    history = model.fit(
        train_generator,
        validation_data=validation_generator,
        epochs=50,
        callbacks=get_callbacks(model_name)
    )
- Evaluation computes precision, recall, F1, AUC on test set.
 - 
    
 
Results
  | Model | Precision | Recall | F1 Score | AUC | 
|---|---|---|---|---|
| ResNet50 | 0.7306 | 0.7280 | 0.7262 | 0.9441 | 
| ResNet101V2 | 0.7003 | 0.6997 | 0.6979 | 0.9333 | 
| EfficientNetB0 | 0.0494 | 0.1673 | 0.0761 | 0.7283 | 
| VGG16 | 0.8863 | 0.8857 | 0.8856 | 0.9881** | 
- VGG16 outperforms others, with steady convergence and minimal overfitting.
 
Key Insights
- VGG16 Superiority: Excels due to its depth for feature extraction, achieving 88.6% accuracy despite small data.
 - EfficientNetB0 Struggles: Poor performance (16.7%) suggests it’s less suited for this dataset without full fine-tuning.
 - Augmentation Impact: Helps mitigate imbalance, but class similarities (e.g., glacier/mountain) limit perfect scores.
 
Learnings
- Why transfer learning is effective for small datasets, leveraging pre-trained features?
 - When to frozen layers vs. fine-tune entire models?
 - How to handle class imbalance with augmentation and careful evaluation metrics?
 
All the above questions were answered through this project, and it was a great learning experience.
This project highlights efficient use of transfer learning for practical image classification, with clear potential for extension to larger datasets.