diff --git a/recognition/README.md b/recognition/README.md deleted file mode 100644 index 32c99e899..000000000 --- a/recognition/README.md +++ /dev/null @@ -1,10 +0,0 @@ -# Recognition Tasks -Various recognition tasks solved in deep learning frameworks. - -Tasks may include: -* Image Segmentation -* Object detection -* Graph node classification -* Image super resolution -* Disease classification -* Generative modelling with StyleGAN and Stable Diffusion \ No newline at end of file diff --git a/recognition/siamese/.gitignore b/recognition/siamese/.gitignore new file mode 100644 index 000000000..1bc2eea3a --- /dev/null +++ b/recognition/siamese/.gitignore @@ -0,0 +1,5 @@ +#ignore dataset files +dataset/ +test.py +__pycache__/ +models/ \ No newline at end of file diff --git a/recognition/siamese/README.md b/recognition/siamese/README.md new file mode 100644 index 000000000..0a5bb8261 --- /dev/null +++ b/recognition/siamese/README.md @@ -0,0 +1,262 @@ +# Siamese Network for ISIC 2020 Skin Lesion Classification +**Author:** s4778251 + +

+ +

+ +## Description + +This repository implements a **Siamese Network** for **binary classification** of dermoscopic images from the **ISIC 2020 Challenge** dataset (melanoma vs. benign). +The approach first trains a **Siamese encoder** using **Triplet Margin Loss** to learn a discriminative embedding space, and then trains a **binary classifier** (4-layer MLP) on top of frozen embeddings for final predictions. +The implementation follows a modular design, with configuration centralized in `params.py`, dataset management in `dataset.py`, and the main training logic in `train.py`. + + + +## How It Works + +### Siamese Encoder +- Backbone: **ResNet-50** pretrained on ImageNet. +- The final fully connected layer is replaced by a **512-dimensional projection head**. +- Embeddings are **L2-normalized** to enforce metric consistency. +- Optimized with **Triplet Margin Loss**, which minimizes the distance between anchor-positive pairs and maximizes distance to negatives. + +### Binary Classifier +- Takes embeddings extracted from the Siamese encoder as input. +- Composed of two hidden layers: 256 → 64 units. +- Uses **LeakyReLU activation** and **Dropout (p=0.4)** for regularization. +- Trained with **CrossEntropyLoss** to distinguish between benign and malignant samples. + +### Evaluation +- After training, the encoder and classifier are evaluated on the test set. +- The model reports overall accuracy, confusion matrix, and per-class precision, recall, and F1-score. +- All plots (training curves, confusion matrix) are saved under `./images/`. + + + +## Project Structure + +``` +siamese/ +├── dataset.py # Data loading and preprocessing pipeline +├── modules.py # Model definitions (SiameseEncoder, BinaryClassifier) +├── train.py # Training pipeline for Siamese and classifier networks +├── predict.py # Evaluation and testing (confusion matrix, metrics) +├── utils.py # Utility functions for plotting, saving samples, feature extraction, etc. +├── params.py # Global configuration (hyperparameters, paths, augmentation, etc.) +└── models/ # Folder for saved models (.pth) + ├── siamese.pth + ├── classifier.pth +└── images/ # Folder for saved output figures + ├── siamese_loss.png + ├── classifier_loss.png + ├── confusion_matrix.png + └── input_sample.png +└── dataset/ # Dataset + ├── train-image/ + ├── train-metadata.csv +``` + + +## File Explanations + +- **params.py** – Stores all global variables and hyperparameters, including dataset paths, image preprocessing, model dimensions, and training settings. +- **dataset.py** – Defines dataset classes, data augmentation, and loaders for both triplet and classification tasks. +- **modules.py** – Contains the model definitions: the Siamese encoder (ResNet-50) and binary classifier (4-layer MLP). +- **utils.py** – Includes helper functions for plotting, saving figures, feature extraction, and directory creation. +- **train.py** – Main training script that trains the Siamese encoder, extracts embeddings, and trains the classifier. +- **predict.py** – Evaluation script that loads trained models, computes predictions, and saves the confusion matrix. + + + +## Dependencies +``` +Tested on Google Colab (CUDA 12.6). + +| Package | Version | +|----------------|----------------| +| torch | 2.8.0+cu126 | +| torchvision | 0.23.0+cu126 | +| numpy | 2.0.2 | +| pandas | 2.2.2 | +| matplotlib | 3.10.0 | +| scikit-learn | 1.6.1 | +``` + + +## Data Preprocessing + +- Input: **256×256 RGB** dermoscopic images (`train-image/`) +- Metadata: `train-metadata.csv` (containing `isic_id`, `patient_id`, `target`) +- Split: **70% train / 10% validation / 20% test**, grouped by patient ID to prevent data leakage. +- Normalization: `mean = [0.5, 0.5, 0.5]`, `std = [0.5, 0.5, 0.5]`. +- Augmentation: random rotations, color jitter, horizontal/vertical flips. + +All preprocessing configurations and split ratios are defined in `params.py` for reproducibility. + +### Justification of Data Splits +A 70 / 10 / 20 (train / validation / test) split was selected to maintain a balance between model generalization and evaluation stability. +Group-based splitting by `patient_id` prevents data leakage between training and test sets, as multiple images can originate from the same patient. + + + +## Training and Testing + +All experiments were conducted in **Google Colab A100**. +Before running, ensure that the working directory is correctly set to the project folder. + + +### Train Both Networks +``` +%cd /content/siamese +!python train.py +``` + +#### This command will: +- Train the Siamese encoder using **Triplet Margin Loss** +- Extract embeddings from the encoder +- Train the binary classifier using **CrossEntropyLoss** +- Save model weights and training plots under `./models/` and `./images/` + + +### Evaluate on Test Set +``` +%cd /content/siamese +!python predict.py +``` + +#### This command loads the trained models and: +- Evaluates performance on the test dataset +- Computes accuracy, precision, recall, and F1-score +- Generates and saves the confusion matrix as `./images/confusion_matrix.png` + + + +## Visual Results + +**1. Siamese Network Training Loss** +

+ +

+The triplet loss of the Siamese encoder steadily decreases during training, showing that the network effectively learns to minimize distances between similar image pairs while separating dissimilar ones. + +--- + +**2. Binary Classifier Loss** +

+ +

+The CrossEntropy loss for both training and validation sets consistently declines, indicating stable convergence. +Validation loss flattens near the end, suggesting moderate generalization with minimal overfitting. + +--- + +**3. Confusion Matrix** +

+ +

+The confusion matrix demonstrates that the classifier correctly identifies most benign and malignant lesions. +Diagonal dominance confirms strong predictive performance and well-learned decision boundaries. + +--- + +**Sample Input Example** +

+ +

+This sample dermoscopic image was randomly **rotated** and **color-adjusted** as part of data augmentation. +Such transformations increase dataset diversity and improve model robustness to variations in image orientation and illumination. + + + +## Training & Evaluation Logs + +Below are condensed console outputs from **train.py** and **predict.py**. +They demonstrate proper training convergence, early stopping, and final evaluation results. + +### Training Log (`train.py`) +The Siamese encoder stops early due to validation loss plateauing, +while the classifier converges smoothly to around **82% validation accuracy**. + +``` +Device: cuda +[INFO] Loaded 33126 samples from train-metadata.csv +[Siamese] Epoch 1/100 train_loss=0.9653 val_loss=0.8922 +[Siamese] Epoch 2/100 train_loss=0.8287 val_loss=0.6524 +[Siamese] Epoch 3/100 train_loss=0.6778 val_loss=0.6933 +[Siamese] Epoch 4/100 train_loss=0.5562 val_loss=0.6903 +. +. +. +[Siamese] Early stopping at epoch 14 +[INFO] Saved final Siamese encoder (stopped model). +[INFO] Extracting embeddings... +[Extract] 100.0% complete +[CLS] Epoch 1/80 train_loss=0.6952 val_loss=0.6876 val_acc=50.00% +[CLS] Epoch 5/80 train_loss=0.6495 val_loss=0.6600 val_acc=50.00% +[CLS] Epoch 10/80 train_loss=0.5977 val_loss=0.6255 val_acc=81.63% +[CLS] Epoch 20/80 train_loss=0.4247 val_loss=0.5239 val_acc=81.63% +[CLS] Epoch 28/80 train_loss=0.2580 val_loss=0.4580 val_acc=82.65% +[CLS] Epoch 33/80 train_loss=0.1685 val_loss=0.4575 val_acc=82.65% +[CLS] Early stopping at epoch 35 +[INFO] Saved final classifier (stopped model). +[INFO] Training finished. All results saved to ./images +``` + + +### Evaluation Log (`predict.py`) +After loading trained models, the classifier achieved 81% test accuracy with balanced precision and recall. + +``` +/content/siamese +Device: cuda +[INFO] Loaded 33126 samples from train-metadata.csv +[INFO] Extracting test features... +[Extract] 100.0% complete +[TEST] Accuracy: 80.51% +[TEST] Confusion Matrix: + [[113 23] + [ 30 106]] + +[TEST] Classification Report: + precision recall f1-score support + benign(0) 0.80 0.82 0.81 136 +malignant(1) 0.81 0.79 0.80 136 + accuracy 0.81 272 + macro avg 0.81 0.81 0.81 272 +weighted avg 0.81 0.81 0.81 272 + +[INFO] Saved confusion_matrix.png to: ./images +``` + + + +## Discussion and Future Work + +The Siamese encoder successfully learned a discriminative embedding space, as reflected by the steadily decreasing triplet loss during training. +However, the validation loss showed noticeable oscillation, suggesting that the triplet sampling strategy may not consistently produce informative anchor–positive–negative pairs. +While the classifier achieved stable convergence and balanced performance (precision and recall ≈ 0.8), the overall accuracy plateaued around 81–82%, indicating that generalization to unseen samples remains limited. + +Several factors may explain these observations: +- The dataset exhibits **class imbalance** and **intra-class variability**, which can make triplet formation unstable. +- The **triplet margin** and **sampling strategy** were fixed throughout training, potentially limiting the diversity of hard examples. + +**Future Work** +- Implement **hard or semi-hard negative mining** to improve triplet selection and reduce validation fluctuation. +- Explore **alternative metric learning losses** (e.g., ArcFace, Contrastive Loss) to enhance inter-class margins and improve embedding quality. + + +## References + +1. **ISIC 2020 Challenge Dataset** – *SIIM-ISIC Melanoma Classification* (Kaggle): + https://www.kaggle.com/datasets/nischaydnk/isic-2020-jpg-256x256-resized/data + +2. **Triplet Margin Loss (PyTorch Documentation)** – + https://pytorch.org/docs/stable/generated/torch.nn.TripletMarginLoss.html + +3. **CrossEntropy Loss (PyTorch Documentation)** – + https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html + +4. **G. Koch, R. Zemel, R. Salakhutdinov et al.**, + *Siamese Neural Networks for One-Shot Image Recognition*, + in *ICML Deep Learning Workshop*, 2015. diff --git a/recognition/siamese/dataset.py b/recognition/siamese/dataset.py new file mode 100644 index 000000000..1410122ec --- /dev/null +++ b/recognition/siamese/dataset.py @@ -0,0 +1,314 @@ +# ISIC 2020 (preprocessed, 256x256) dataset utils +# Provides: +# - ISICTable: load and split metadata table +# - ISICImageDataset: standard image dataset for classification +# - ISICTripletDataset: triplet dataset for siamese training +# - get_loaders: prepare DataLoader objects for training and evaluation +# Author: s4778251 + +import os +import random +from pathlib import Path +from typing import Tuple, Optional, List +import pandas as pd +from PIL import Image +from sklearn.model_selection import StratifiedShuffleSplit, GroupShuffleSplit +import torch +from torch.utils.data import Dataset, DataLoader +import torchvision.transforms as T + +from params import ( + DATAPATH, CSV_NAME, IMG_DIR, SEED, + TRAIN_FRAC, VAL_FRAC, TEST_FRAC, USE_GROUP_SPLIT, + BATCH_TRIPLET, BATCH_CLASSIF, NUM_WORKERS, + MEAN, STD, IMAGE_SIZE, ROT_DEG, FLIP_PROB, COLOR_JITTER +) + + +class ISICTable: + """Handle ISIC2020 metadata table loading, cleaning, and splitting.""" + + def __init__(self, root: str, csv_name: str = CSV_NAME, image_dir: str = IMG_DIR): + """Load and preprocess ISIC metadata table. + + Args: + root (str): Root directory containing CSV and image folder. + csv_name (str): Name of the CSV file with metadata. + image_dir (str): Subdirectory containing images. + """ + self.root = Path(root) + df = pd.read_csv(self.root / csv_name) + + # Remove unnamed index column if present (common artifact from CSV export) + if df.columns[0].lower().startswith("unnamed"): + df = df.drop(columns=[df.columns[0]]) + + # Normalize column names and keep only relevant ones + df.columns = [c.strip().lower() for c in df.columns] + df = df[["isic_id", "patient_id", "target"]] + + # Construct image file paths + img_dir_path = self.root / image_dir / "image" + df["filepath"] = df["isic_id"].astype(str).apply(lambda x: str(img_dir_path / f"{x}.jpg")) + + # Keep only existing image files + df = df[df["filepath"].apply(os.path.exists)].reset_index(drop=True) + df["target"] = df["target"].astype(int) + + if len(df) == 0: + raise RuntimeError(f"No .jpg images found in {img_dir_path}.") + self.df = df + print(f"[INFO] Loaded {len(df)} samples from {csv_name}") + + + def _split_no_group(self, train: float, val: float, seed: int): + """Perform stratified split without grouping by patient IDs. + + Args: + train (float): Training set fraction. + val (float): Validation set fraction. + seed (int): Random seed. + + Returns: + tuple(pd.DataFrame): (train_df, val_df, test_df) + """ + y = self.df["target"].values + + # First split into train and (val+test) + sss = StratifiedShuffleSplit(n_splits=1, train_size=train, random_state=seed) + train_idx, temp_idx = next(sss.split(self.df, y)) + temp = self.df.iloc[temp_idx] + y_temp = temp["target"].values + + # Split remaining into validation and test + sss2 = StratifiedShuffleSplit(n_splits=1, train_size=val / (1.0 - train), random_state=seed) + val_rel, test_rel = next(sss2.split(temp, y_temp)) + val_idx = temp.index[val_rel] + test_idx = temp.index[test_rel] + return ( + self.df.loc[train_idx].reset_index(drop=True), + self.df.loc[val_idx].reset_index(drop=True), + self.df.loc[test_idx].reset_index(drop=True), + ) + + + def _split_with_group(self, train: float, val: float, seed: int): + """Perform group-aware split by patient IDs. + + Args: + train (float): Training set fraction. + val (float): Validation set fraction. + seed (int): Random seed. + + Returns: + tuple(pd.DataFrame): (train_df, val_df, test_df) + """ + y = self.df["target"].values + groups = self.df["patient_id"].astype(str).values + + # Train / temp split using group-level shuffle + gss = GroupShuffleSplit(n_splits=1, train_size=train, random_state=seed) + train_idx, temp_idx = next(gss.split(self.df, y, groups)) + temp = self.df.iloc[temp_idx] + y_temp = temp["target"].values + groups_temp = temp["patient_id"].astype(str).values + + # Split remaining into validation and test + gss2 = GroupShuffleSplit(n_splits=1, train_size=val / (1.0 - train), random_state=seed) + val_rel, test_rel = next(gss2.split(temp, y_temp, groups_temp)) + val_idx = temp.index[val_rel] + test_idx = temp.index[test_rel] + return ( + self.df.loc[train_idx].reset_index(drop=True), + self.df.loc[val_idx].reset_index(drop=True), + self.df.loc[test_idx].reset_index(drop=True), + ) + + + def split(self, train=TRAIN_FRAC, val=VAL_FRAC, test=TEST_FRAC, + use_group: bool = USE_GROUP_SPLIT, seed: int = SEED): + """Split the dataset into train, validation, and test sets. + + Args: + train (float): Fraction for training set. + val (float): Fraction for validation set. + test (float): Fraction for test set. + use_group (bool): Whether to use group-aware splitting. + seed (int): Random seed. + + Returns: + tuple(pd.DataFrame): (train_df, val_df, test_df) + """ + assert abs(train + val + test - 1.0) < 1e-6 # sanity check + if use_group and "patient_id" in self.df.columns: + return self._split_with_group(train, val, seed) + return self._split_no_group(train, val, seed) + + + @staticmethod + def balance_1to1(df: pd.DataFrame, seed: int = SEED) -> pd.DataFrame: + """Balance dataset to a 1:1 ratio between positive and negative samples. + + Args: + df (pd.DataFrame): Input dataframe with 'target' column. + seed (int): Random seed. + + Returns: + pd.DataFrame: Balanced dataframe. + """ + pos = df[df["target"] == 1] + neg = df[df["target"] == 0] + if len(pos) == 0 or len(neg) == 0: + return df.reset_index(drop=True) + if len(pos) < len(neg): + neg = neg.sample(n=len(pos), random_state=seed) + else: + pos = pos.sample(n=len(neg), random_state=seed) + out = pd.concat([pos, neg]).sample(frac=1.0, random_state=seed) + return out.reset_index(drop=True) + + +class ISICImageDataset(Dataset): + """Torch dataset for standard classification mode.""" + + def __init__(self, df: pd.DataFrame, transform=None): + self.df = df.reset_index(drop=True) + self.tfm = transform + + def __len__(self) -> int: + """Return number of samples.""" + return len(self.df) + + def __getitem__(self, i: int): + """Load and transform the i-th sample. + + Args: + i (int): Sample index. + + Returns: + tuple(torch.Tensor, int, int): (image, label, index) + """ + row = self.df.iloc[i] + img = Image.open(row["filepath"]).convert("RGB") + if self.tfm: + img = self.tfm(img) + label = int(row["target"]) + return img, label, i + + +class ISICTripletDataset(Dataset): + """Torch dataset for triplet generation (anchor, positive, negative).""" + + def __init__(self, df: pd.DataFrame, transform=None, seed: int = SEED): + self.df = df.reset_index(drop=True) + self.tfm = transform + + # Index samples by class for easy positive/negative sampling + self.by_cls = { + 0: self.df[self.df["target"] == 0].index.tolist(), + 1: self.df[self.df["target"] == 1].index.tolist(), + } + random.seed(seed) + + def __len__(self) -> int: + return len(self.df) + + def _load(self, idx: int): + """Load one image by its index and apply transforms if defined.""" + path = self.df.iloc[idx]["filepath"] + img = Image.open(path).convert("RGB") + return self.tfm(img) if self.tfm else img + + def __getitem__(self, i: int): + """Return a triplet (anchor, positive, negative, label).""" + anc_row = self.df.iloc[i] + y = int(anc_row["target"]) + + # Pick a positive sample from same class (not itself) + same = [j for j in self.by_cls[y] if j != i] + pos_idx = random.choice(same) if same else i + + # Pick a negative sample from opposite class + neg_idx = random.choice(self.by_cls[1 - y]) + anc = self._load(i) + pos = self._load(pos_idx) + neg = self._load(neg_idx) + return anc, pos, neg, y + + +def build_transforms(image_size: int = IMAGE_SIZE): + """Create image transformations for training and evaluation.""" + train_tfm = T.Compose([ + T.RandomHorizontalFlip(p=FLIP_PROB), + T.RandomVerticalFlip(p=FLIP_PROB), + T.RandomRotation(degrees=ROT_DEG), + T.ColorJitter(**COLOR_JITTER), + T.ToTensor(), + T.Normalize(mean=MEAN, std=STD), + ]) + eval_tfm = T.Compose([ + T.ToTensor(), + T.Normalize(mean=MEAN, std=STD), + ]) + return train_tfm, eval_tfm + + +def get_loaders( + dataroot: str = DATAPATH, + balance_each_split: bool = True, + use_group_split: bool = USE_GROUP_SPLIT, + batch_triplet: int = BATCH_TRIPLET, + batch_classif: int = BATCH_CLASSIF, + num_workers: int = NUM_WORKERS, +): + """Build dataloaders for Siamese and classification training. + + Args: + dataroot (str): Root dataset directory. + balance_each_split (bool): Whether to balance classes in each split. + use_group_split (bool): Whether to use patient-based group splitting. + batch_triplet (int): Batch size for triplet dataloader. + batch_classif (int): Batch size for classification dataloader. + num_workers (int): Number of parallel data-loading workers. + + Returns: + dict[str, torch.utils.data.DataLoader]: Dictionary of dataloaders. + """ + table = ISICTable(dataroot, CSV_NAME, IMG_DIR) + tr_df, va_df, te_df = table.split(train=TRAIN_FRAC, val=VAL_FRAC, test=TEST_FRAC, + use_group=use_group_split, seed=SEED) + if balance_each_split: + tr_df = ISICTable.balance_1to1(tr_df, seed=SEED) + va_df = ISICTable.balance_1to1(va_df, seed=SEED) + te_df = ISICTable.balance_1to1(te_df, seed=SEED) + + tfm_train, tfm_eval = build_transforms(image_size=IMAGE_SIZE) + + # Datasets for Siamese training + ds_triplet = ISICTripletDataset(tr_df, transform=tfm_train, seed=SEED) + dl_triplet = DataLoader(ds_triplet, batch_size=batch_triplet, shuffle=True, + num_workers=num_workers, pin_memory=True, drop_last=True) + + ds_val_triplet = ISICTripletDataset(va_df, transform=tfm_eval, seed=SEED) + dl_val_triplet = DataLoader(ds_val_triplet, batch_size=batch_triplet, shuffle=False, + num_workers=num_workers, pin_memory=True, drop_last=False) + + # Datasets for classification + ds_tr_cls = ISICImageDataset(tr_df, transform=tfm_train) + ds_va_cls = ISICImageDataset(va_df, transform=tfm_eval) + ds_te_cls = ISICImageDataset(te_df, transform=tfm_eval) + + dl_tr_cls = DataLoader(ds_tr_cls, batch_size=batch_classif, shuffle=True, + num_workers=num_workers, pin_memory=True) + dl_va_cls = DataLoader(ds_va_cls, batch_size=batch_classif, shuffle=False, + num_workers=num_workers, pin_memory=True) + dl_te_cls = DataLoader(ds_te_cls, batch_size=batch_classif, shuffle=False, + num_workers=num_workers, pin_memory=True) + + return { + "triplet_train": dl_triplet, + "triplet_val": dl_val_triplet, + "classif_train": dl_tr_cls, + "classif_val": dl_va_cls, + "classif_test": dl_te_cls, + } diff --git a/recognition/siamese/images/Siamese Network.webp b/recognition/siamese/images/Siamese Network.webp new file mode 100644 index 000000000..59f72428e Binary files /dev/null and b/recognition/siamese/images/Siamese Network.webp differ diff --git a/recognition/siamese/images/classifier_loss.png b/recognition/siamese/images/classifier_loss.png new file mode 100644 index 000000000..9fc9a946e Binary files /dev/null and b/recognition/siamese/images/classifier_loss.png differ diff --git a/recognition/siamese/images/confusion_matrix.png b/recognition/siamese/images/confusion_matrix.png new file mode 100644 index 000000000..7f2eaa7dd Binary files /dev/null and b/recognition/siamese/images/confusion_matrix.png differ diff --git a/recognition/siamese/images/input_sample.png b/recognition/siamese/images/input_sample.png new file mode 100644 index 000000000..97a5ee1a8 Binary files /dev/null and b/recognition/siamese/images/input_sample.png differ diff --git a/recognition/siamese/images/siamese_loss.png b/recognition/siamese/images/siamese_loss.png new file mode 100644 index 000000000..041cde148 Binary files /dev/null and b/recognition/siamese/images/siamese_loss.png differ diff --git a/recognition/siamese/modules.py b/recognition/siamese/modules.py new file mode 100644 index 000000000..eeffaaf60 --- /dev/null +++ b/recognition/siamese/modules.py @@ -0,0 +1,100 @@ +# modules.py +# Siamese network modules: encoder and classifier. +# Author: s4778251 + + + +import torch +import torch.nn as nn +import torchvision.models as models +from params import OUT_DIM, HIDDEN_DIMS, NEGATIVE_SLOPE, DROPOUT_P + + +class SiameseEncoder(nn.Module): + """Feature extraction network for Siamese training. + + This encoder uses a ResNet-50 backbone pretrained on ImageNet and projects the + resulting feature vector into a normalized embedding space. It is typically + trained using triplet loss to ensure semantically similar samples are close + together while dissimilar ones are farther apart. + """ + + def __init__(self, out_dim=OUT_DIM, pretrained=True): + """ + Args: + out_dim (int): Dimensionality of the output embedding vector. + pretrained (bool): Whether to initialize the ResNet-50 backbone with + ImageNet-pretrained weights. + """ + super().__init__() + + # Load the ResNet50 backbone and remove its final classification layer + base = models.resnet50(weights=models.ResNet50_Weights.DEFAULT if pretrained else None) + feat_dim = base.fc.in_features + base.fc = nn.Identity() + self.backbone = base + + # Linear projection to the target embedding dimension + self.proj = nn.Linear(feat_dim, out_dim) + + def forward(self, x): + """Forward pass through the encoder. + + Args: + x (torch.Tensor): Input batch of images with shape (B, 3, H, W). + + Returns: + torch.Tensor: L2-normalized embeddings of shape (B, out_dim). + """ + feat = self.backbone(x) # Extract features via ResNet + emb = self.proj(feat) # Project into embedding space + emb = nn.functional.normalize(emb, p=2, dim=1) # Normalize to unit length + return emb + + +class BinaryClassifier(nn.Module): + """Four-layer MLP classifier for binary prediction. + + This model takes precomputed embeddings (e.g., from SiameseEncoder) and maps + them through a sequence of fully connected layers with LeakyReLU activation + and dropout regularization. The output layer produces two logits for binary + classification. + """ + + def __init__(self, in_dim=OUT_DIM, hidden=HIDDEN_DIMS, num_classes=2, + negative_slope=NEGATIVE_SLOPE, p=DROPOUT_P): + """ + Args: + in_dim (int): Input feature dimension (should match encoder output). + hidden (tuple[int]): Sizes of hidden layers. + num_classes (int): Number of output classes (2 for binary tasks). + negative_slope (float): Slope for LeakyReLU activation. + p (float): Dropout probability for regularization. + """ + super().__init__() + layers = [] + last = in_dim + + # Build MLP layers dynamically from hidden size sequence + for h in hidden: + layers += [ + nn.Linear(last, h), + nn.LeakyReLU(negative_slope=negative_slope, inplace=True), + nn.Dropout(p) + ] + last = h + + # Final classification layer without activation + layers += [nn.Linear(last, num_classes)] + self.net = nn.Sequential(*layers) + + def forward(self, x): + """Forward pass through the classifier. + + Args: + x (torch.Tensor): Input feature batch (B, in_dim). + + Returns: + torch.Tensor: Output logits (B, num_classes). + """ + return self.net(x) diff --git a/recognition/siamese/params.py b/recognition/siamese/params.py new file mode 100644 index 000000000..3816b3646 --- /dev/null +++ b/recognition/siamese/params.py @@ -0,0 +1,55 @@ +# params.py +# Configuration parameters for Siamese network training and evaluation. +# Author: s4778251 + +# Dataset and Path Settings +DATAPATH = "./dataset" +CSV_NAME = "train-metadata.csv" +IMG_DIR = "train-image" + +MODELPATH = "./models" +IMAGEPATH = "./images" + +# Data Split and Loader +SEED = 42 # one true number! +TRAIN_FRAC, VAL_FRAC, TEST_FRAC = 0.7, 0.1, 0.2 # Dataset split ratios +USE_GROUP_SPLIT = True # Whether to use patient-based group split + +BATCH_TRIPLET = 64 # Batch size for Siamese training (triplet loss) +BATCH_CLASSIF = 64 # Batch size for classifier training +NUM_WORKERS = 4 # Number of worker threads for data loading + +# Image Preprocessing +MEAN = [0.5, 0.5, 0.5] +STD = [0.5, 0.5, 0.5] +IMAGE_SIZE = 256 # Image resize dimension +ROT_DEG = 15 # Max rotation degree for data augmentation +FLIP_PROB = 0.5 # Probability of horizontal/vertical flip +COLOR_JITTER = dict( # color jitter parameters + brightness=0.1, contrast=0.1, saturation=0.05, hue=0.02 +) + +# Model Settings +OUT_DIM = 512 # Output embedding dimension of Siamese encoder +HIDDEN_DIMS = (256, 64) # Hidden layer dimensions for classifier MLP +NEGATIVE_SLOPE = 0.01 # LeakyReLU slope for classifier +DROPOUT_P = 0.4 # Dropout probability for classifier layers + +# Training Hyperparameters +TRIPLET_MARGIN = 1.0 # Margin for triplet loss +EPOCHS_SIAMESE = 100 # Max epochs for Siamese encoder training +EPOCHS_CLS = 80 # Max epochs for classifier training +LR_SIAMESE = 0.0001 # Learning rate for Siamese encoder +LR_CLS = 0.0005 # Learning rate for classifier + +# Early Stop / Scheduler +PATIENCE = 5 # Early stopping patience in epochs +MIN_DELTA = 0.001 # Minimum improvement threshold for validation loss +SCHED_FACTOR = 0.5 +SCHED_PATIENCE = 3 + +# Output Filenames +SAVE_SAMPLE_NAME = "input_sample.png" # Example image filename +SIAMESE_LOSS_NAME = "siamese_loss.png" # Siamese loss plot filename +CLS_LOSS_NAME = "classifier_loss.png" # Classifier loss plot filename +CM_NAME = "confusion_matrix.png" # Confusion matrix filename diff --git a/recognition/siamese/predict.py b/recognition/siamese/predict.py new file mode 100644 index 000000000..28b30488e --- /dev/null +++ b/recognition/siamese/predict.py @@ -0,0 +1,69 @@ +# predct.py +# Evaluate trained Siamese encoder + classifier on test set. +# Author: s4778251 + + +import os +import torch +from sklearn.metrics import confusion_matrix, classification_report +from dataset import get_loaders +from modules import SiameseEncoder, BinaryClassifier +from utils import plot_confusion_matrix, extract_features +from params import MODELPATH, IMAGEPATH, OUT_DIM, CM_NAME + + +def main(): + """Evaluate trained Siamese encoder + classifier on the test set. + + This script: + 1. Loads trained model weights from disk. + 2. Extracts test embeddings using the Siamese encoder. + 3. Applies the trained binary classifier to predict classes. + 4. Computes accuracy, confusion matrix, and classification report. + 5. Saves the confusion matrix as an image. + + The results are printed to the console and saved in IMAGEPATH. + """ + device = "cuda" if torch.cuda.is_available() else "cpu" + print("Device:", device) + + # Load dataloaders for evaluation + loaders = get_loaders() + cls_te = loaders["classif_test"] + + # Initialize models + encoder = SiameseEncoder(out_dim=OUT_DIM).to(device) + clf = BinaryClassifier(in_dim=OUT_DIM).to(device) + + # Load pretrained weights + encoder.load_state_dict(torch.load(os.path.join(MODELPATH, "siamese.pth"), map_location=device)) + clf.load_state_dict(torch.load(os.path.join(MODELPATH, "classifier.pth"), map_location=device)) + + print("[INFO] Extracting test features...") + + # Extract embeddings and labels from test set + Xte, yte = extract_features(encoder, cls_te, device) + + # Predict class logits using classifier + clf.eval() + with torch.no_grad(): + preds = clf(Xte.to(device)).argmax(1).cpu() + + # Compute evaluation metrics + acc = (preds == yte).float().mean().item() + cm = confusion_matrix(yte.numpy(), preds.numpy()) + + print(f"[TEST] Accuracy: {acc*100:.2f}%") + print("[TEST] Confusion Matrix:\n", cm) + print("\n[TEST] Classification Report:\n", + classification_report(yte.numpy(), preds.numpy(), + target_names=["benign(0)", "malignant(1)"])) + + # Save confusion matrix plot + plot_confusion_matrix(cm, classes=["Benign", "Malignant"], + save_path=os.path.join(IMAGEPATH, CM_NAME)) + print(f"[INFO] Saved {CM_NAME} to: {IMAGEPATH}") + + +if __name__ == "__main__": + main() diff --git a/recognition/siamese/train.py b/recognition/siamese/train.py new file mode 100644 index 000000000..3b322fa72 --- /dev/null +++ b/recognition/siamese/train.py @@ -0,0 +1,212 @@ +# train.py +# Train Siamese encoder + binary classifier on ISIC dataset. +# Author: s4778251 + + +from params import ( + MODELPATH, IMAGEPATH, TRIPLET_MARGIN, + LR_SIAMESE, LR_CLS, EPOCHS_SIAMESE, EPOCHS_CLS, + PATIENCE, MIN_DELTA, SCHED_FACTOR, SCHED_PATIENCE, + SIAMESE_LOSS_NAME, CLS_LOSS_NAME, SAVE_SAMPLE_NAME, OUT_DIM +) +from dataset import get_loaders +from modules import SiameseEncoder, BinaryClassifier +import os +import torch +import torch.nn as nn +from utils import ensure_dir, plot_lines, save_sample_input, extract_features + + +def train_siamese(encoder, train_loader, val_loader, device): + """Train the Siamese encoder using triplet loss. + + Args: + encoder (torch.nn.Module): Siamese feature encoder model. + train_loader (DataLoader): Training dataloader providing triplets. + val_loader (DataLoader): Validation dataloader providing triplets. + device (str): Device to perform computation ("cuda" or "cpu"). + + Returns: + tuple[list[float], list[float]]: + Two lists of per-epoch training and validation losses. + """ + encoder.train() + opt = torch.optim.Adam(encoder.parameters(), lr=LR_SIAMESE, betas=(0.9, 0.999)) + criterion = nn.TripletMarginLoss(margin=TRIPLET_MARGIN, p=2) + + tr_hist, va_hist = [], [] + best_val = float('inf') + waited = 0 # patience counter for early stopping + + for epoch in range(EPOCHS_SIAMESE): + encoder.train() + train_sum = 0.0 + # Iterate through triplets (anchor, positive, negative, label) + for anc, pos, neg, _ in train_loader: + anc, pos, neg = anc.to(device), pos.to(device), neg.to(device) + za, zp, zn = encoder(anc), encoder(pos), encoder(neg) + loss = criterion(za, zp, zn) + opt.zero_grad() + loss.backward() + opt.step() + train_sum += loss.item() + + avg_tr = train_sum / max(1, len(train_loader)) + + # Validation loop (no gradient computation) + encoder.eval() + val_sum = 0.0 + with torch.no_grad(): + for anc, pos, neg, _ in val_loader: + anc, pos, neg = anc.to(device), pos.to(device), neg.to(device) + za, zp, zn = encoder(anc), encoder(pos), encoder(neg) + vloss = criterion(za, zp, zn) + val_sum += vloss.item() + + avg_va = val_sum / max(1, len(val_loader)) + tr_hist.append(avg_tr) + va_hist.append(avg_va) + + print(f"[Siamese] Epoch {epoch+1}/{EPOCHS_SIAMESE} train_loss={avg_tr:.4f} val_loss={avg_va:.4f}") + + # Early stopping logic + if avg_va < best_val - MIN_DELTA: + best_val = avg_va + waited = 0 + else: + waited += 1 + if waited >= PATIENCE: + print(f"[Siamese] Early stopping at epoch {epoch+1}") + break + + # Save trained encoder weights + torch.save(encoder.state_dict(), os.path.join(MODELPATH, "siamese.pth")) + print("[INFO] Saved final Siamese encoder (stopped model).") + return tr_hist, va_hist + + +def train_classifier(clf, train_data, val_data, device): + """Train the binary classifier on precomputed embeddings. + + Args: + clf (torch.nn.Module): Binary classification MLP model. + train_data (tuple[Tensor, Tensor]): Training embeddings and labels. + val_data (tuple[Tensor, Tensor]): Validation embeddings and labels. + device (str): Device ("cuda" or "cpu"). + + Returns: + tuple[list[float], list[float], list[float]]: + Training losses, validation losses, and validation accuracies. + """ + opt = torch.optim.Adam(clf.parameters(), lr=LR_CLS, betas=(0.9, 0.999), weight_decay=5e-4) + scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(opt, mode='min', factor=SCHED_FACTOR, patience=SCHED_PATIENCE) + criterion = nn.CrossEntropyLoss() + + Xtr, ytr = train_data + Xva, yva = val_data + + tr_hist, va_hist, va_acc_hist = [], [], [] + best_val = float('inf') + waited = 0 + + for epoch in range(EPOCHS_CLS): + clf.train() + # Shuffle embeddings before each epoch + idx = torch.randperm(len(Xtr)) + Xb, yb = Xtr[idx].to(device), ytr[idx].to(device) + logits = clf(Xb) + loss = criterion(logits, yb) + opt.zero_grad() + loss.backward() + opt.step() + train_loss = loss.item() + + # Validation phase + clf.eval() + with torch.no_grad(): + v_logits = clf(Xva.to(device)) + val_loss = criterion(v_logits, yva.to(device)).item() + val_acc = (v_logits.argmax(1).cpu() == yva).float().mean().item() + + scheduler.step(val_loss) + tr_hist.append(train_loss) + va_hist.append(val_loss) + va_acc_hist.append(val_acc) + + print(f"[CLS] Epoch {epoch+1}/{EPOCHS_CLS} train_loss={train_loss:.4f} val_loss={val_loss:.4f} val_acc={val_acc*100:.2f}%") + + # Early stopping + if val_loss < best_val - MIN_DELTA: + best_val = val_loss + waited = 0 + else: + waited += 1 + if waited >= PATIENCE: + print(f"[CLS] Early stopping at epoch {epoch+1}") + break + + # Save classifier weights + torch.save(clf.state_dict(), os.path.join(MODELPATH, "classifier.pth")) + print("[INFO] Saved final classifier (stopped model).") + return tr_hist, va_hist, va_acc_hist + + +def main(): + """Main training routine for Siamese + classification stages. + + This pipeline: + 1. Loads dataset loaders for triplet and classification tasks. + 2. Trains the Siamese encoder using triplet loss. + 3. Extracts embeddings from the encoder for classification training. + 4. Trains the binary classifier using cross-entropy loss. + 5. Plots and saves both loss curves and one sample input image. + """ + device = "cuda" if torch.cuda.is_available() else "cpu" + print("Device:", device) + + # Prepare data loaders + loaders = get_loaders() + tri_train = loaders["triplet_train"] + tri_val = loaders["triplet_val"] + cls_tr = loaders["classif_train"] + cls_va = loaders["classif_val"] + + # Ensure output directories exist + ensure_dir(MODELPATH) + ensure_dir(IMAGEPATH) + + # Train Siamese Encoder + encoder = SiameseEncoder(out_dim=OUT_DIM).to(device) + siam_tr_hist, siam_va_hist = train_siamese(encoder, tri_train, tri_val, device) + + # Plot Siamese loss curve + xs = list(range(1, len(siam_tr_hist) + 1)) + plot_lines(xs, [siam_tr_hist, siam_va_hist], ["Training", "Validation"], + title="Loss of the Siamese Network", + xlabel="Epochs", ylabel="Triplet Loss", + save_path=os.path.join(IMAGEPATH, SIAMESE_LOSS_NAME)) + + # Extract Embeddings + print("[INFO] Extracting embeddings...") + encoder.eval() + Xtr, ytr = extract_features(encoder, cls_tr, device) + Xva, yva = extract_features(encoder, cls_va, device) + + # Train Classifier + clf = BinaryClassifier(in_dim=OUT_DIM).to(device) + cls_tr_hist, cls_va_hist, _ = train_classifier(clf, (Xtr, ytr), (Xva, yva), device) + + # Plot classifier loss curve + xs = list(range(1, len(cls_tr_hist) + 1)) + plot_lines(xs, [cls_tr_hist, cls_va_hist], ["Training", "Validation"], + title="Loss of the Binary Classifier", + xlabel="Epochs", ylabel="CrossEntropy Loss", + save_path=os.path.join(IMAGEPATH, CLS_LOSS_NAME)) + + # Save a sample input image for reference + save_sample_input(loaders["classif_train"], IMAGEPATH, filename=SAVE_SAMPLE_NAME) + print(f"[INFO] Training finished. All results saved to {IMAGEPATH}") + + +if __name__ == "__main__": + main() diff --git a/recognition/siamese/utils.py b/recognition/siamese/utils.py new file mode 100644 index 000000000..ea06b987d --- /dev/null +++ b/recognition/siamese/utils.py @@ -0,0 +1,151 @@ +# utils.py +# Utility functions for Siamese network training and evaluation. +# Includes directory management, plotting, sample saving, and feature extraction. +# Author: s4778251 + +import os +import torch +import matplotlib.pyplot as plt +import numpy as np +import torchvision +from params import MEAN, STD, SAVE_SAMPLE_NAME + + +def ensure_dir(path): + """Create a directory if it does not already exist. + + Args: + path (str | None): Directory path to create. If None or empty, nothing is created. + + Notes: + This is a safe helper that mirrors `mkdir -p` behavior. It never raises if the + directory already exists and does nothing for falsy paths. + """ + if path and not os.path.exists(path): + os.makedirs(path, exist_ok=True) + + +def plot_lines(xs, ys_list, labels, title, xlabel, ylabel, save_path): + """Plot one or more lines and save the figure to disk. + + Args: + xs (Sequence[float | int]): Common x-axis values for all lines. + ys_list (Sequence[Sequence[float]]): A list of y-value sequences, one per line. + labels (Sequence[str]): Legend labels corresponding to each sequence in ys_list. + title (str): Figure title. + xlabel (str): X-axis label. + ylabel (str): Y-axis label. + save_path (str): File path where the plot image will be saved. + + Behavior: + Creates a new figure, draws each line in order, adds a legend if labels are given, + tightens the layout, ensures the parent directory exists, saves the image, and + finally closes the figure to free memory. + """ + plt.figure() + for ys, lb in zip(ys_list, labels): + plt.plot(xs, ys, label=lb) # one line per series + if labels: + plt.legend() + plt.title(title) + plt.xlabel(xlabel) + plt.ylabel(ylabel) + plt.tight_layout() + ensure_dir(os.path.dirname(save_path)) + plt.savefig(save_path, dpi=200) + plt.close() # avoid accumulating open figures + + +def plot_confusion_matrix(cm, classes, save_path): + """Visualize and save a confusion matrix as an image. + + Args: + cm (np.ndarray): Square confusion matrix of integer counts with shape (C, C). + classes (Sequence[str]): Class names for tick labels, length must be C. + save_path (str): File path where the plot image will be saved. + + """ + plt.figure() + plt.imshow(cm, interpolation='nearest', aspect='auto') + plt.title('Confusion Matrix') + plt.colorbar() + tick_marks = np.arange(len(classes)) + plt.xticks(tick_marks, classes, rotation=45) + plt.yticks(tick_marks, classes) + thresh = cm.max() / 2.0 if cm.size else 0 + for i in range(cm.shape[0]): + for j in range(cm.shape[1]): + plt.text( + j, + i, + format(cm[i, j], 'd'), + ha="center", + va="center", + color="white" if cm[i, j] > thresh else "black", + ) + plt.ylabel('True label') + plt.xlabel('Predicted label') + plt.tight_layout() + ensure_dir(os.path.dirname(save_path)) + plt.savefig(save_path, dpi=200) + plt.close() + + +def save_sample_input(dataloader, save_dir, filename=SAVE_SAMPLE_NAME): + """Save a single example image from a dataloader to disk for quick inspection. + + Args: + dataloader (torch.utils.data.DataLoader): A dataloader that yields (image, label, index). + save_dir (str): Directory where the image should be stored. + filename (str): File name for the saved image. + + Behavior: + Takes the first batch, inverts the normalization using the configured mean and std, + converts the first image to HWC layout, clips to the valid range, and saves it as a PNG. + """ + ensure_dir(save_dir) + sample_img, sample_label, _ = next(iter(dataloader)) + img = sample_img[0] + # Build an "inverse" normalization to undo the standardization for visualization. + inv_norm = torchvision.transforms.Normalize( + mean=[-m / s for m, s in zip(MEAN, STD)], + std=[1 / s for s in STD], + ) + img_show = inv_norm(img).permute(1, 2, 0).clamp(0, 1) + plt.imshow(img_show) + plt.title(f"Sample Input (Label: {sample_label[0].item()})") + plt.axis("off") + path = os.path.join(save_dir, filename) + plt.savefig(path, bbox_inches="tight", dpi=200) + plt.close() + + +@torch.no_grad() +def extract_features(encoder, loader, device): + """Run a feature encoder over a dataset and collect embeddings and labels. + + Args: + encoder (torch.nn.Module): Model that maps images to embedding vectors. + loader (torch.utils.data.DataLoader): Dataloader yielding (image, label, index). + device (str | torch.device): Device spec to run the encoder on, e.g. "cuda" or "cpu". + + Returns: + Tuple[torch.Tensor, torch.Tensor]: + A pair (X, y) where X has shape (N, D) of embeddings and y has shape (N,) of labels. + + Notes: + The function prints progress every 10 batches and at completion. Gradients are disabled + via the torch.no_grad decorator to reduce memory use and increase throughput. + """ + encoder.eval() # ensure batchnorm/dropout layers are in eval mode + xs, ys = [], [] + total = len(loader) + for i, (xb, yb, _) in enumerate(loader): + feats = encoder(xb.to(device)).cpu() # move inputs to device, bring features back to CPU + xs.append(feats) + ys.append(yb) + if (i + 1) % 10 == 0 or (i + 1) == total: + pct = 100.0 * (i + 1) / total + print(f"\r[Extract] {pct:5.1f}% complete", end="") + print() + return torch.cat(xs), torch.cat(ys)