diff --git a/recognition/Siamese_Network_MAILLOT/README.md b/recognition/Siamese_Network_MAILLOT/README.md new file mode 100644 index 000000000..0740efac9 --- /dev/null +++ b/recognition/Siamese_Network_MAILLOT/README.md @@ -0,0 +1,223 @@ +# Siamese network for the ISIC 2020 Kaggle Challenge classification + +Melissa Maillot - s4851573 + +## Problem + +### Data + +The ISIC 2020 Kaggle Challenge is a classification problem where skin lesions need to be classified between melonoma and normal. The dataset contains 33126 images. The dataset has severe class imbalance with only 584 melonoma samples and 32542 normal samples. + +### Siamese networks + +We implement a siamese network with triplet loss to attempt to solve this issue. + +Siamese networks are a type of metric learning model, that aims to compare the similarity of samples. Siamese networks learns to distinguish samples from different classes by using a twin network with equal weights. Both samples are passed in the network which extracts their features, and the distance between extracted embedding vectors are compared. + +In the case of classification, the training of the siamese networks aims at distinguishing the different classes in the embedding dimension, such that a classifier can be trained on the embeddings. The siamese is used as a feature extractor that maximises the embedding distnaces between classes such that the classifier may easily distinguish classes in this high dimensional space. + +Two types of loss are usually used for training a siamese network: contrastive lost and triplet loss. In this implementation, we use triplet loss. Triplet loss compares the distance between embedings of sample of the same class and a sample from another class. + +## Implementation + +### Model + +The implemented neural network has two parts: a feature extractor and a classification head. + +The feature extratctor architecture is a ResNet50 model (not pre-trained) from the PyTorch library [[1](#references)], with the last layer modified to change the extractor head and embedding dimension. As stated above, triplet loss was used. + +The classificaiton head is just a single layer perceptron with an output dimension of two for the two classes. Cross entropy loss was used as the loss function for the classifier head. + +### Metrics + +Several metrics will be used to understand the model's performance. + +First, classification accuracy will be used to get a general idea of the model's performance. However this metric isn't ideal to understand the full performance of the classifier, as the heavy data imbalance makes in unrealiable if the datasubset being considered is not balanced. + +Then, we will consider the area under the reciever operating characteristic curve (ROC-AUC). It helps us understand how well the model finds true positives compared to false positives, which can help us better understand whether classifier detects melanoma and whether it incorrectly classifies benign as malignant. + +Also, accorrding to [[2](#references)][[3](#references)][[4](#references)], ROC-AUC is not always ideal for binary classification, especially in our case with high imbalance in the dataset. One of the main issue is that ROC-AUC dose not allow us to correctly gauge the importance of false negatives. However, false negatives are extremely important in this context: an undetected melanoma can evolve into a life-threatening condition. As such false negatives are much more worrying than false positives. Since ROC-AUC fails to totally capture their importance, we will also be considering the area under the precision-recall curve (AUPRC), also called the average precision (AP) score. The precision-recall curve plots the precision (`tp/(tp+fp)`) against the recall (`tp/(tp+fn)`). The recall thus includes the much needed information on false negatives into the metric and can help us gauge whether the model could miss malignant cases. The AP score is implemented in `scikit-learn` [[5](#references)]. + +### File structure + +#### Data downloading and storage + +The data used in this project is the preprocessed ISIC 2020 dataset available [here](https://www.kaggle.com/datasets/nischaydnk/isic-2020-jpg-256x256-resized/data). In this dataset, the images have been resized to `256x256`. The metadata files only contains the images labels, image names and patient IDs. + +To run the code in this repository, you need to download the dataset from the above kaggle link to the machine that will run the code. Ideally, the downloaded materials should be placed in their own folder. The data needs to be reorganised to fit the following structure: + +``` +your-data-folder-name/ +├── train-metadata.csv +└── image/ + ├── ISIC_0015719.jpg + ├── ISIC_0052212.jpg + └── ... +``` + +This `your-data-folder-name` folder can be placed anywhere in the machine, so long as the path to the folder is passed to the `DATA_ROOT` hyperparameter. The parameter is currently set such that if the folder is named `data`, it should be placed in this location after cloning the repository: + +``` +PatternAnalysis-2025/recognition/Siamese_Network_Maillot/ +│ +├── README_figures/ +│ └── ... +│ +├── dataset.py +├── modules.py +├── predict.py +├── train.py +├── README.md +│ +└── data/ + ├── train-metadata.csv + └── image/ + ├── ISIC_0015719.jpg + ├── ISIC_0052212.jpg + └── ... +``` + +#### Code files + +`dataset.py` contains all the classes required for data manipulation and data loading. This class handles making a 80/10/10 train/validation/test split of the data. It also oversamples the minority class for the training set, such that the training set is balanced. At runtime, the training data will be augmented with rotations, flips and colour jitters. The validation and testing set are not oversampled nor augmented. + +`modules.py` contains the neural network architectures and the triplet loss function implementation. The neural network consists of a ResNet50 and a simple classifier head. The triplet loss function is implemented by hand, following the following equation: + +``` +L(A, P, N) = max(0, ||f(A) - f(P)||^2 - ||f(A) - f(N)||^2 + margin) +``` + +`train.py` contains the main training loop and its helper functions. The training loop will save the best model as well as the loss log and metric log plots to the data location folder (for ease of ignoring with git if needed, this does not affect the dataloading). Calling this file will run the whole data retrieval, model training and testing code. + +`predict.py` contains code to evaluate the model on the test split of the dataset. It produces metrics as well as plots. Plots will also be saved to the data location folder for consistency. The metrics computed are: accuracy, ROC AUC, AP score, sensitivity, specificity: The plots are: confusion matrix, ROC curve, precision-recall curve, t-SNE visualisation of embeddings. + +### Python and dependencies + +This project uses Python version `3.13.7` + +Additonally, the following packages are required in the following versions: +- torch: 2.8.0+cu126 +- torchvision: 0.23.0+cu126 +- numpy: 2.1.2 +- scikit-learn: 1.7.2 +- matplotlib: 3.10.6 +- pandas: 2.3.3 + +## Results + +Here we present results of the most successful run of training. + +### Hyperparameters + +The hyperparameters for the model that gave the best metrics were as follows: + +```py +EMBEDDING_DIM = 128 +MARGIN = 1.25 +NUM_EPOCHS = 20 +LEARNING_RATE = 1e-4 +TRAIN_DATA_SUBSET_FRACTION = 0.3 +TRAIN_BATCH_SIZE = 32 +VAL_TEST_BATCH_SIZE = 256 +``` + +The optimiser used was `Adam` and the learining rate scheduler was `OneCycleLR` with the following parameters: + +```py +max_lr=LEARNING_RATE +steps_per_epoch=train_samples.shape[0]//TRAIN_BATCH_SIZE//100 +epochs=NUM_EPOCHS +anneal_strategy="cos" +``` + +### Model training + +The model was trained for 20 epoch, but the model with the highest AP score was from epoch 10. The training and validation metrics of that model are as follows: + +![Best model training and validation metrics](README_figures/best_model_train_val_metrics.png) + +The loss over the different epochs show that the model had low loss on the validation set on that epoch. + +![Loss plotted against epochs](README_figures/loss_logs.png) + +There is also high validation accuracy on that epoch. The validation ROC AUC and the AP score are at their highest in that epoch. + +![Loss plotted against epochs](README_figures/metrics_logs.png) + +We notice that the validation triplet loss, the ROC AUC and the AP score somewhat plateau after the tenth epoch. However, the classificaiton loss and the classification accuracy continue increasing. This was not further investigated, however it may be a result of training both the embedder and the classification head at the same time. It may potentially be insightful to modify the training so that both components are trained separately on their own number of epochs, optimiser and scheduler. This was not tested due to lack of time. + +### Model testing + +The model was tested on the test set. The metrics were evaluated once on the test set and once on a balanced subset of the test set giving us different insights. + +Test metrics on the full test set were as follows: + +``` +Classification Accuracy: 0.8539 +ROC AUC: 0.8573 +Average Precision Score: 0.1503 +Sensitivity: 0.6034 +Specificity: 0.8584 +``` + +Test metrics on the test set sample were as follows: + +``` +Classification Accuracy: 0.7328 +ROC AUC: 0.8546 +Average Precision Score: 0.8437 +Sensitivity: 0.6034 +Specificity: 0.8621 +``` + +The sensitivity is low, which shows the model predicts too many false negatives. The influence of the class imbalance is also seen in how the classification accuracy changes between the two. + +The confusion matrices show the same issue. + +Here the confusion matrice on the full test set: + +![Confusion matrice full test set](README_figures/confusion_matrix.png) + +Here the confusion matrice on the test set sample: + +![Confusion matrice test set sample](README_figures/confusion_matrix_Subset.png) + +The ROC curve and the precision-recall curve on the test subset don't look too alarming. + +![ROC curve and PR curve test set sample](README_figures/ROCAUC_PRC_Subset.png) + +However the precision-recall curve on the full test set shows a different story. + +![ROC curve and PR curve full test set](README_figures/ROCAUC_PRC.png) + +These plots also show that the ROC curve cannot always be trusted, especially with imbalanced datasets. The ROC looks similarly good in both cases, and the ROC AUC in general has looked promising through this whole process. The precision-recall curve here shows that the model is not performing as well as the ROC suggests. + +Now we consider the t-SNE representation of the embeddings. + +![tSNE full test set](README_figures/testing_tsne_embeddings.png) + +The visualisation of embeddings for the full test set seems to suggest the presence of two groups. The normal lesion are overwhelmingly present in both groups. The melanoma lesion are mostly fould in the left part of the left group, which suggest that despite its mild performance, the model does find some sort of pattern in the data. + +![tSNE test set sample](README_figures/testing_tsne_embeddings_Subset.png) + +The embeddings on sample show that the two groups are destinct from each other to some extent. It may suggest that there are some distinct features that can discriminate both classes, but those features have not been sufficiently learnt but the model. + +### Review of results + +This trained model is not optimal. Some metrics such as AP score (on the testing subset) and ROC AUC seem to convey that the model acheives well. However, the problem at hand is a medical problem and misclassifications as false negatives can have life-threatening repurcussion. Melanomas are one of the main causes of skin cancer, and as such a misclassification could end in the death of a patient. The problem with this model is its very high sensitivity (false negative rate). A false positive is less of an issue as manual revue of lesions classified as positive is likely to take place. The goal of such a classifier is to filter out any benign skin lesion such that manual review is not needed. So any melanoma missed is one too many and the current model misses too many to be reliable. Improvements are requiered for this model to fully serve its intended function. + +## Improvements + +This model is far from optimal. The number of false negatives is still much to high. This is an issue as melanoma can evolve into life-threatening conditions if not treated early. In that sense, this current model is unrealiable for unseen data. Several points of improvement may include: +- Training on a larger subset of the data: most of the data is currently not used in training as the computing power to train on the full dataset was not available (training times were too long). +- Changing the triplet loss for batch hard mining triplet loss. [[6](#references)] suggests that batch hard mining is a more efficient way to train the model, as it only uses hard triplets to calculate the loss. Implimenting this loss was attempted but unsuccessful: the model did not learn, it is unknown where the issue stemmed from and there was not enough time to troubleshoot the issue. +- More extensive hyper-parameter tuning and more exploration of different augmentation techiniques. +- Experimenting with the model architecture, whether it may be changing the embedder architecture, the classifier head, or even make the embedder and the classifier to seperate networks to have more control over the training of each part respectively. + +## References +- [1] resnet50. Available at: https://docs.pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html +- [2] The relationship between Precision-Recall and ROC curves. Available at: https://dl.acm.org/doi/10.1145/1143844.1143874 +- [3] Imbalanced Data? Stop Using ROC-AUC and Use AUPRC Instead. Available at: https://towardsdatascience.com/imbalanced-data-stop-using-roc-auc-and-use-auprc-instead-46af4910a494/ +- [4] ROC AUC vs Precision-Recall for Imbalanced Data. Available at: https://machinelearningmastery.com/roc-auc-vs-precision-recall-for-imbalanced-data/ +- [5] `average_precision_score`. Available at: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html#sklearn.metrics.average_precision_score +- [6] In Defense of the Triplet Loss for Person Re-Identification. Available at: https://arxiv.org/pdf/1703.07737 \ No newline at end of file diff --git a/recognition/Siamese_Network_MAILLOT/README_figures/ROCAUC_PRC.png b/recognition/Siamese_Network_MAILLOT/README_figures/ROCAUC_PRC.png new file mode 100644 index 000000000..e8d6b32ba Binary files /dev/null and b/recognition/Siamese_Network_MAILLOT/README_figures/ROCAUC_PRC.png differ diff --git a/recognition/Siamese_Network_MAILLOT/README_figures/ROCAUC_PRC_Subset.png b/recognition/Siamese_Network_MAILLOT/README_figures/ROCAUC_PRC_Subset.png new file mode 100644 index 000000000..870d60f29 Binary files /dev/null and b/recognition/Siamese_Network_MAILLOT/README_figures/ROCAUC_PRC_Subset.png differ diff --git a/recognition/Siamese_Network_MAILLOT/README_figures/best_model_train_val_metrics.png b/recognition/Siamese_Network_MAILLOT/README_figures/best_model_train_val_metrics.png new file mode 100644 index 000000000..44494209e Binary files /dev/null and b/recognition/Siamese_Network_MAILLOT/README_figures/best_model_train_val_metrics.png differ diff --git a/recognition/Siamese_Network_MAILLOT/README_figures/confusion_matrix.png b/recognition/Siamese_Network_MAILLOT/README_figures/confusion_matrix.png new file mode 100644 index 000000000..0499cecd8 Binary files /dev/null and b/recognition/Siamese_Network_MAILLOT/README_figures/confusion_matrix.png differ diff --git a/recognition/Siamese_Network_MAILLOT/README_figures/confusion_matrix_Subset.png b/recognition/Siamese_Network_MAILLOT/README_figures/confusion_matrix_Subset.png new file mode 100644 index 000000000..f7b068f7b Binary files /dev/null and b/recognition/Siamese_Network_MAILLOT/README_figures/confusion_matrix_Subset.png differ diff --git a/recognition/Siamese_Network_MAILLOT/README_figures/loss_logs.png b/recognition/Siamese_Network_MAILLOT/README_figures/loss_logs.png new file mode 100644 index 000000000..88ad69b79 Binary files /dev/null and b/recognition/Siamese_Network_MAILLOT/README_figures/loss_logs.png differ diff --git a/recognition/Siamese_Network_MAILLOT/README_figures/metrics_logs.png b/recognition/Siamese_Network_MAILLOT/README_figures/metrics_logs.png new file mode 100644 index 000000000..5d35294ba Binary files /dev/null and b/recognition/Siamese_Network_MAILLOT/README_figures/metrics_logs.png differ diff --git a/recognition/Siamese_Network_MAILLOT/README_figures/testing_tsne_embeddings.png b/recognition/Siamese_Network_MAILLOT/README_figures/testing_tsne_embeddings.png new file mode 100644 index 000000000..4f692b397 Binary files /dev/null and b/recognition/Siamese_Network_MAILLOT/README_figures/testing_tsne_embeddings.png differ diff --git a/recognition/Siamese_Network_MAILLOT/README_figures/testing_tsne_embeddings_Subset.png b/recognition/Siamese_Network_MAILLOT/README_figures/testing_tsne_embeddings_Subset.png new file mode 100644 index 000000000..4112dc710 Binary files /dev/null and b/recognition/Siamese_Network_MAILLOT/README_figures/testing_tsne_embeddings_Subset.png differ diff --git a/recognition/Siamese_Network_MAILLOT/dataset.py b/recognition/Siamese_Network_MAILLOT/dataset.py new file mode 100644 index 000000000..f8152bede --- /dev/null +++ b/recognition/Siamese_Network_MAILLOT/dataset.py @@ -0,0 +1,192 @@ +""" +Melissa Maillot - s4851573 +COMP3710 2025S2 - Report +dataset.py - contains all data manipulation and Dataset code +""" + +import torch +from torch.utils.data import Dataset +from torchvision import transforms +import numpy as np +import pandas as pd +from sklearn.model_selection import train_test_split +import random +from PIL import Image +from pathlib import Path + +SEED = 48515739 +random.seed(SEED) +np.random.seed(SEED) + + +def split_data(data_root): + """ + Fetches reference dataframe + Splits data frame in 80/10/10 train/validation/test sets + Oversamples the minority class to have equal number of each class in the train set + Returns three dataframes: the train set, the validation set, the test set + + Image files are not manipulated as it would cause unnecessary overhead + """ + data_dir = Path(data_root) + + # Fetch the image names and labels dataset and load to a dataframe + data_df = pd.read_csv((data_dir / "train-metadata.csv"), index_col=0) + + # Get IDs and labels for dataset train/validation/test splitting + # The isic_id is unique + image_ids = data_df["isic_id"] + labels = data_df["target"] + + # Split into train, validation and test sets + # 80% of data to train, 10% to validate, 10% to test + # Split train and validation/test + train_ids, val_test_ids, train_labels, val_test_labels = train_test_split( + image_ids, labels, test_size=0.2, stratify=labels, random_state=SEED + ) + # Split validation and test + val_ids, test_ids, val_labels, test_labels = train_test_split( + val_test_ids, val_test_labels, test_size=0.5, stratify=val_test_labels, random_state=SEED + ) + + # Subset dataframe for train, validation and test + # The isic_id column will be used to fetch the images when dataloading + # The dataframe index is reset for ease of access at dataloading phase + train_samples = data_df[data_df["isic_id"].isin(train_ids)].reset_index(drop=True) + val_samples = data_df[data_df["isic_id"].isin(val_ids)].reset_index(drop=True) + test_samples = data_df[data_df["isic_id"].isin(test_ids)].reset_index(drop=True) + + # Oversample the minority class in the training set + # There will be an equal amount of rows for each class + normal_samples_size = train_samples[train_samples["target"]== 0].shape[0] + melanoma_sample = train_samples[train_samples["target"]== 1] + oversample_sample = melanoma_sample.sample(n=normal_samples_size - melanoma_sample.shape[0], replace=True, random_state=SEED) + + # Concatenate the data and the oversaampled data into one dataframe + train_samples = pd.concat([train_samples, oversample_sample], ignore_index=True) + train_samples = train_samples.sample(frac=1).reset_index(drop=True) + # Logic: We duplicate some of the image references in the training data + # label dataframe. Since the images will be transformed when loaded, this + # will augment the melanoma samples. We only add duplicated rows as this + # array is what gets iterated on by the Dataloader. There is no need to + # duplicate the image, that is useless use of memory. The augmented array + # is shuffled so that randomisation is ensured when dataloaders iterate + # the dataset. + + return train_samples, val_samples, test_samples + + +class TripletDataset(Dataset): + """ + Custom Dataset for generating (Anchor, Positive, Negative) triplets + when given the pandas dataframe that lists the images in the set and its labels + """ + def __init__(self, root_dir, items_df, transform=None): + #self.root_dir = root_dir + #self.transform = transform + + # get the image folder path + self.image_dir = (Path(root_dir) / 'image') + # get the labels dataframe + self.items_df = items_df + # Label names + self.classes = ['normal', 'melanoma'] + + # Standard image transformation to which we add supplied tranformations + self.transform = transforms.Compose( + (transform.transforms if transform else [])+ + #[transforms.ToPILImage(), + [transforms.ToTensor(), + transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])] + ) + + # Total number of unique images to iterate over + self.len = self.items_df.shape[0] + + def __len__(self): + return self.len + + def __getitem__(self, index): + # 1. Select Anchor (A) + #anchor_path, anchor_class = self.all_paths[index] + + # Get image information from the dataframe + anchor = self.items_df.iloc[index] + # Get image label + anchor_class = anchor["target"] + # Get image + anchor_name = anchor["isic_id"] + anchor_image = Image.open(self.image_dir / (anchor_name + ".jpg")).convert('RGB') + # Transform image + anchor_image = self.transform(anchor_image) + + # 2. Select Positive (P) + # Select an image from the same class as the anchor, but not the anchor itself + try: + positive = self.items_df[(self.items_df["isic_id"]!=anchor_name) & (self.items_df["target"]==anchor_class)].sample() + except: + # Handle edge case where only one image exists in the class (should not happen in real ISIC) + positive = anchor + + # Get image + positive_name = positive["isic_id"].item() + positive_image = Image.open(self.image_dir / (positive_name + ".jpg")).convert('RGB') + # Transform image + positive_image = self.transform(positive_image) + + # 3. Select Negative (N) + # Select a class different from the anchor class (binary case is simple) + negative_class = 1 - anchor_class + # Select a negative sample + negative = self.items_df[self.items_df["target"]==negative_class].sample() + # Get image + negative_name = negative["isic_id"].item() + negative_image = Image.open(self.image_dir / (negative_name + ".jpg")).convert('RGB') + # Transform image + negative_image = self.transform(negative_image) + + # Return triplet and the anchor's original label for verification/testing + return anchor_image, positive_image, negative_image, anchor_class + + +class SkinDataset(Dataset): + """ + Custom Dataset to load the test set + Function the same as TripletDataset, but doesn't return triplets, + just an image and its label. + """ + def __init__(self, root_dir, items_df, transform:transforms.Compose=None): + + # get the image folder path + self.image_dir = (Path(root_dir) / 'image') + # get the labels dataframe + self.items_df = items_df + # Label names + self.classes = ['normal', 'melanoma'] + + # Standard image transformation to which we add supplied tranformations + self.transform = transforms.Compose( + (transform.transforms if transform else [])+ + #[transforms.ToPILImage(), + [transforms.ToTensor(), + transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])] + ) + + self.len = self.items_df.shape[0] + + def __len__(self): + return self.len + + def __getitem__(self, idx): + + # Get image information from the dataframe + item = self.items_df.iloc[idx] + # Get image label + label = item["target"] + # Get image + image_name = item["isic_id"] + image = Image.open(self.image_dir / (image_name + ".jpg")).convert('RGB') + # Transform image + image = self.transform(image) + + return image, torch.tensor(label, dtype=torch.long) \ No newline at end of file diff --git a/recognition/Siamese_Network_MAILLOT/modules.py b/recognition/Siamese_Network_MAILLOT/modules.py new file mode 100644 index 000000000..f53449c75 --- /dev/null +++ b/recognition/Siamese_Network_MAILLOT/modules.py @@ -0,0 +1,69 @@ +""" +Melissa Maillot - s4851573 +COMP3710 2025S2 - Report +modules.py - contains all neural network and custom loss function code +""" + +import torch +import torch.nn as nn +from torchvision import models + +class EmbeddingNet(nn.Module): + """ + Non-pretrained CNN to generate image embeddings. + Simple classifier head for classification + """ + def __init__(self, out_dim): + super(EmbeddingNet, self).__init__() + + # load ResNet50 model + resnet = models.resnet50() + + # change the feature extractor head + self.extractor = nn.Sequential(*list(resnet.children())[:-1]) + self.fc_out = nn.Sequential( + nn.Linear(2048, 512), + nn.ReLU(inplace=True), + nn.Dropout(0.3), # 0.5 + nn.Linear(512, 256), + nn.ReLU(inplace=True), + nn.Dropout(0.3), # 0.5 + nn.Linear(256, out_dim) + ) + + # classification head + self.classifier = nn.Linear(out_dim, 2) + + def forward(self, x): + # extract features + x = self.extractor(x) + # Flatten the feature map + x = x.view(x.size(0), -1) + # Final embedding output + x = self.fc_out(x) + + return x + + def classify(self, x): + # classifiy + return self.classifier(x) + + +class TripletLoss(nn.Module): + """ + Triplet loss function based on the distance between embeddings. + L(A, P, N) = max(0, ||f(A) - f(P)||^2 - ||f(A) - f(N)||^2 + margin) + """ + def __init__(self, margin=1.0): + super(TripletLoss, self).__init__() + self.margin = margin + self.p = 2 # L2 distance + + def forward(self, anchor, positive, negative): + # Calculate squared L2 distance + d_pos = nn.functional.pairwise_distance(anchor, positive, p=self.p) + d_neg = nn.functional.pairwise_distance(anchor, negative, p=self.p) + + # Triplet loss formula + loss = torch.relu(d_pos - d_neg + self.margin).mean() + return loss \ No newline at end of file diff --git a/recognition/Siamese_Network_MAILLOT/predict.py b/recognition/Siamese_Network_MAILLOT/predict.py new file mode 100644 index 000000000..8672f1920 --- /dev/null +++ b/recognition/Siamese_Network_MAILLOT/predict.py @@ -0,0 +1,120 @@ +""" +Melissa Maillot - s4851573 +COMP3710 2025S2 - Report +predict.py - produces evaluation metrics and plots for models on the test set +""" + +import torch +from torch.utils.data import DataLoader +import numpy as np +import pandas as pd +from sklearn.metrics import ( + roc_auc_score, accuracy_score, average_precision_score, + confusion_matrix, ConfusionMatrixDisplay, + RocCurveDisplay, roc_curve, + PrecisionRecallDisplay, precision_recall_curve + ) +from sklearn.manifold import TSNE +from pathlib import Path +import matplotlib.pyplot as plt + +from dataset import SkinDataset + +def test_set_evaluation(model, test_samples, device, data_root): + """ + Gives the following evaluation metrics for the provided model: + - classification accuracy + - ROC AUC + - average precision score + - sensitivity + - specificity + + Provides and saves graphical displays of: + - the confusion matrix + - the ROC curve and the precision-recall curve + - t-SNE visualisation of embeddings + + It does so on the full test dataset and on a balanced subset of the dataset. + """ + + # Get a balanced sample of the test set + test_samples_subset = test_samples[test_samples["target"]== 0].sample(n=test_samples[test_samples["target"]== 1].shape[0]) + test_samples_subset = pd.concat([test_samples_subset, test_samples[test_samples["target"]== 1]], ignore_index=True) + + for i in ["Subset", ""]: + + # get the correct dataset + if i == "Subset": + test_dataset = SkinDataset(data_root, test_samples_subset, transform=None) + else: + test_dataset = SkinDataset(data_root, test_samples, transform=None) + + # get the data loader + test_loader = DataLoader(test_dataset, batch_size=256, shuffle=True, num_workers=0) + + model.eval() + with torch.no_grad(): + test_all_labels = [] + test_all_embeds = [] + test_all_predictions = [] + test_all_probs = [] + for i, (images, labels) in enumerate(test_loader): + images = images.to(device) + + # Get embeddings + embeddings = model(images) + + # classify embeddings + output = model.classify(embeddings) + + # Predictions and Probabilities + _, preds = torch.max(output, 1) + probs = torch.softmax(output, dim=1)[:, 1] # Probability of class 1 (Melanoma) + test_all_labels.extend(labels.cpu().numpy()) + test_all_embeds.extend(embeddings.cpu().numpy()) + test_all_predictions.extend(preds.cpu().numpy()) + test_all_probs.extend(probs.cpu().numpy()) + + # --- calculate metrics --- + test_acc = accuracy_score(test_all_labels, test_all_predictions) + test_auc = roc_auc_score(test_all_labels, test_all_probs) + test_aps = average_precision_score(test_all_labels, test_all_probs) + conf_matrix = confusion_matrix(test_all_labels, test_all_predictions) + tn, fp, fn, tp = conf_matrix.ravel() + sensitivity = tp / (tp + fn) + specificity = tn / (tn + fp) + + # get metrics + print(f"Test{" "+i if i else ""} Classification Accuracy: {test_acc:.4f}") + print(f"Test{" "+i if i else ""} ROC AUC: {test_auc:.4f}") + print(f"Test{" "+i if i else ""} Average Precision Score: {test_aps:.4f}") + print(f"Test{" "+i if i else ""} Sensitivity: {sensitivity:.4f}") + print(f"Test{" "+i if i else ""} Specificity: {specificity:.4f}") + + # --- plotting --- + # confusion matrix + cm = confusion_matrix(test_all_labels, test_all_predictions) + cm_display = ConfusionMatrixDisplay(cm).plot(cmap="Blues") + plt.tight_layout() + plt.savefig((Path(data_root) / ('confusion_matrix'+('_'+i if i else '')+'.png'))) + + # ROC AUC and precision-recall + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5)) + fpr, tpr, _ = roc_curve(test_all_labels, test_all_probs) + roc_display = RocCurveDisplay(fpr=fpr, tpr=tpr).plot(ax=ax1) + ax1.set_title((i+" " if i else "")+"ROC curve") + prec, recall, _ = precision_recall_curve(test_all_labels, test_all_probs) + pr_display = PrecisionRecallDisplay(precision=prec, recall=recall).plot(ax=ax2) + ax2.set_title((i+" " if i else "")+"Precision-Recall curve") + plt.tight_layout() + plt.savefig((Path(data_root) / ('ROCAUC_PRC'+('_'+i if i else '')+'.png'))) + + # t-SNE manifold + tsne = TSNE(n_components=2, random_state=42) + embeddings_2d = tsne.fit_transform(np.array(test_all_embeds)) + plt.figure(figsize=(8, 6)) + scatter = plt.scatter(np.array(embeddings_2d)[:, 0], np.array(embeddings_2d)[:, 1], c=np.array(test_all_labels)) + plt.colorbar(scatter) + plt.title((i+" " if i else "")+'t-SNE visualisation of embeddings') + plt.tight_layout() + plt.savefig((Path(data_root) / ('testing_tsne_embeddings'+('_'+i if i else '')+'.png'))) \ No newline at end of file diff --git a/recognition/Siamese_Network_MAILLOT/train.py b/recognition/Siamese_Network_MAILLOT/train.py new file mode 100644 index 000000000..15ea739df --- /dev/null +++ b/recognition/Siamese_Network_MAILLOT/train.py @@ -0,0 +1,330 @@ +""" +Melissa Maillot - s4851573 +COMP3710 2025S2 - Report +train.py - main file, trains and evaluates the siamese network +""" + +import torch +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim +from torch.utils.data import DataLoader +from torchvision import transforms +import numpy as np +from sklearn.metrics import roc_auc_score, accuracy_score, average_precision_score +import random +from pathlib import Path +import matplotlib.pyplot as plt + +from dataset import split_data, TripletDataset +from modules import EmbeddingNet, TripletLoss +from predict import test_set_evaluation + +# --- Hyperparameters --- +DATA_ROOT = './data' # IMPORTANT: structure of the data root should be data-root> train-metadata.csv | image +IMAGE_SIZE = 256 +EMBEDDING_DIM = 128 +MARGIN = 1.25 +NUM_EPOCHS = 20 +LEARNING_RATE = 1e-4 +TRAIN_DATA_SUBSET_FRACTION = 0.3 +TRAIN_BATCH_SIZE = 32 +VAL_TEST_BATCH_SIZE = 256 + +def train_epoch(model, dataloader, criterion, classification_crit, optimizer, scheduler, device): + """ + Trains one epoch of the model + Return epoch training metrics: + average embedding loss, average classification loss, classification accuracy, ROC AUC, AP Score + """ + model.train() + + all_labels = [] + all_predictions = [] + all_probs = [] + emb_running_loss = 0.0 + class_running_loss = 0.0 + total_samples = 0 + + for i, (img_a, img_p, img_n, label_a) in enumerate(dataloader): + + img_a, img_p, img_n, label_a = img_a.to(device), img_p.to(device), img_n.to(device), label_a.to(device) + + optimizer.zero_grad() + + # Get embeddings + emb_a = model(img_a) + emb_p = model(img_p) + emb_n = model(img_n) + # Calculate loss + emb_loss = criterion(emb_a, emb_p, emb_n) + + # classify anchors + out_a = model.classify(emb_a) + # Calculate classification loss + class_loss = classification_crit(out_a, label_a) + + # total loss and update weights + loss = emb_loss + class_loss + loss.backward() + optimizer.step() + + # loss logging + total_samples += img_a.size(0) + emb_running_loss += emb_loss.item() * img_a.size(0) + class_running_loss += class_loss.item() * img_a.size(0) + + # Predictions and Probabilities + _, preds = torch.max(out_a, 1) + probs = torch.softmax(out_a, dim=1)[:, 1] # Probability of class 1 (Melanoma) + all_labels.extend(label_a.cpu().numpy()) + all_predictions.extend(preds.cpu().numpy()) + all_probs.extend(probs.cpu().detach().numpy()) + + if (i + 1) % 50 == 0: + print(f'Batch {i+1}/{len(dataloader)}, Embedding training loss: {(emb_loss.item()):.4f}, Classification training loss: {(class_loss.item()):.4f}') + + # scheduler step is specifically for the OneCycleLR scheduler used + # if using a different scheduler, change when the step happens + if (i + 1) % 100 == 0: + scheduler.step() + + # calculate metrics + emb_epoch_loss = emb_running_loss / total_samples + class_epoch_loss = class_running_loss / total_samples + acc = accuracy_score(all_labels, all_predictions) + auc = roc_auc_score(all_labels, all_probs) + aps = average_precision_score(all_labels, all_probs) + return emb_epoch_loss, class_epoch_loss, acc, auc, aps + +def evaluate(model, dataloader, criterion, classification_crit, device): + """ + Evaluates on training epoch on the provided data (usually the validation set) + Return epoch validation metrics: + average embedding loss, average classification loss, classification accuracy, ROC AUC, AP Score + """ + model.eval() + + all_labels = [] + all_predictions = [] + all_probs = [] + emb_running_loss = 0.0 + class_running_loss = 0.0 + total_samples = 0 + + with torch.no_grad(): + for _, (img_a, img_p, img_n, label_a) in enumerate(dataloader): + + img_a, img_p, img_n, label_a = img_a.to(device), img_p.to(device), img_n.to(device), label_a.to(device) + + # Get embeddings + emb_a = model(img_a) + emb_p = model(img_p) + emb_n = model(img_n) + # Calculate loss + emb_loss = criterion(emb_a, emb_p, emb_n) + + # classify anchors + out_a = model.classify(emb_a) + # Calculate classification loss + class_loss = classification_crit(out_a, label_a) + + # loss logging + total_samples += img_a.size(0) + emb_running_loss += emb_loss.item() * img_a.size(0) + class_running_loss += class_loss.item() * img_a.size(0) + + # Predictions and Probabilities + _, preds = torch.max(out_a, 1) + probs = torch.softmax(out_a, dim=1)[:, 1] # Probability of class 1 (Melanoma) + all_labels.extend(label_a.cpu().numpy()) + all_predictions.extend(preds.cpu().numpy()) + all_probs.extend(probs.cpu().detach().numpy()) + + # calculate metrics + emb_epoch_loss = emb_running_loss / total_samples + class_epoch_loss = class_running_loss / total_samples + acc = accuracy_score(all_labels, all_predictions) + auc = roc_auc_score(all_labels, all_probs) + aps = average_precision_score(all_labels, all_probs) + return emb_epoch_loss, class_epoch_loss, acc, auc, aps + +def main(): + """ + Run Training on SiameseNet for classification of ISIC 2020 data. + Training will be preformed and then evaluation results on the trained model will be produced. + """ + + # --- Configuration --- + device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + print(f"Using device: {device}") + print("\n") + # Set Seed + SEED = 48515739 + random.seed(SEED) + np.random.seed(SEED) + + # fetch dataframes referencing the data and split between train, validation and test + train_samples, val_samples, test_samples = split_data(DATA_ROOT) + # take only a subset of the training set + train_samples = train_samples.sample(frac=TRAIN_DATA_SUBSET_FRACTION).reset_index(drop=True) + print(f"Number of normal samples in training data subset: {train_samples[train_samples["target"]== 0].shape[0]}") + print(f"Number of melanoma samples in training data subset: {train_samples[train_samples["target"]== 1].shape[0]}") + + # Setup DataLoaders + # add additional transformations to the training set + train_dataset = TripletDataset(DATA_ROOT, train_samples, + transform=transforms.Compose([ + transforms.RandomRotation(degrees=10, fill=(255, 255, 255)), + transforms.RandomHorizontalFlip(p=0.5), + transforms.RandomVerticalFlip(p=0.5), + transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.05) + ])) + + val_dataset = TripletDataset(DATA_ROOT, val_samples, transform=None) + + train_loader = DataLoader(train_dataset, batch_size=TRAIN_BATCH_SIZE, shuffle=True, num_workers=0) + val_loader = DataLoader(val_dataset, batch_size=VAL_TEST_BATCH_SIZE, shuffle=True, num_workers=0) + + # Setup Model, Loss, Optimizer + model = EmbeddingNet(out_dim=EMBEDDING_DIM).to(device) + criterion = TripletLoss(margin=MARGIN).to(device) + classifier_crit = nn.CrossEntropyLoss().to(device) + optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE) + scheduler = optim.lr_scheduler.OneCycleLR(optimizer, + max_lr=LEARNING_RATE, + steps_per_epoch=train_samples.shape[0]//TRAIN_BATCH_SIZE//100, + epochs=NUM_EPOCHS, + anneal_strategy="cos") + + # --- Training Loop --- + # metric logging + best_val_AP_score = -1.0 + emb_train_loss_log = [] + emb_val_loss_log = [] + clas_train_loss_log = [] + clas_val_loss_log = [] + train_accuracy_log = [] + val_accuracy_log = [] + train_ROC_AUC_log = [] + val_ROC_AUC_log = [] + train_AP_score_log = [] + val_AP_score_log = [] + + print("\n--- Starting Training ---") + for epoch in range(1, NUM_EPOCHS + 1): + # Train + print(f"\n==== Training Epoch {epoch} ====") + emb_train_loss, class_train_loss, train_acc, train_auc, train_aps = train_epoch(model, train_loader, criterion, classifier_crit, optimizer, scheduler, device) + + # Print training metrics + print(f"Epoch {epoch} training finished.") + print(f"Average Training Embedding Loss: {emb_train_loss:.4f}") + print(f"Average Training Classification Loss: {class_train_loss:.4f}") + print(f"Training Classification Accuracy: {train_acc:.4f}") + print(f"Training ROC AUC: {train_auc:.4f}") + print(f"Training Average Precision Score: {train_aps:.4f}") + + # Log training metrics + emb_train_loss_log.append(emb_train_loss) + clas_train_loss_log.append(class_train_loss) + train_accuracy_log.append(train_acc) + train_ROC_AUC_log.append(train_auc) + train_AP_score_log.append(train_aps) + + # Evaluate + emb_val_loss, class_val_loss, val_acc, val_auc, val_aps = evaluate(model, val_loader, criterion, classifier_crit, device) + + print("--- Validation phase ---") + # Print validation metrics + print(f"Average Validation Embedding Loss: {emb_val_loss:.4f}") + print(f"Average Validation Classification Loss: {class_val_loss:.4f}") + print(f"Validation Classification Accuracy: {val_acc:.4f}") + print(f"Validation ROC AUC: {val_auc:.4f}") + print(f"Validation Average Precision Score: {val_aps:.4f}") + + # Log validation metrics + emb_val_loss_log.append(emb_val_loss) + clas_val_loss_log.append(class_val_loss) + val_accuracy_log.append(val_acc) + val_ROC_AUC_log.append(val_auc) + val_AP_score_log.append(val_aps) + + # Save best model + # We choose the best model on the basis of the highest validation precison-recall score + # That way we hope to limit false negatives + if val_aps > best_val_AP_score: + print(f"Previous best average precision score: {best_val_AP_score:.4f}") + best_val_AP_score = val_aps + print("Saving best model...") + torch.save(model.state_dict(), (Path(DATA_ROOT) / 'best_siamese_model.pth')) + + print("\n--- Training Finished ---") + print(f"Best Validation Average Precision Score: {best_val_AP_score:.4f}%") + + # --- Training visualisation --- + # Plot loss over epochs + plt.figure(figsize=(10, 5)) + + plt.subplot(1, 2, 1) + plt.plot(range(NUM_EPOCHS), emb_train_loss_log, label='Train Loss', color='#97a6c4') + plt.plot(range(NUM_EPOCHS), emb_val_loss_log, label='Validation Loss', color='#384860') + plt.title('Embedding Loss over Epochs') + plt.xlabel('Epochs') + plt.ylabel('Loss') + plt.legend() + + plt.subplot(1, 2, 2) + plt.plot(range(NUM_EPOCHS), clas_train_loss_log, label='Train Loss', color='#97a6c4') + plt.plot(range(NUM_EPOCHS), clas_val_loss_log, label='Validation Loss', color='#384860') + plt.title('Classification Loss over Epochs') + plt.xlabel('Epochs') + plt.ylabel('Loss') + plt.legend() + + plt.tight_layout() + plt.savefig((Path(DATA_ROOT) / 'loss_logs.png')) + plt.show() + plt.close() + + # plot metrics over epochs + plt.figure(figsize=(15, 5)) + + plt.subplot(1, 3, 1) + plt.plot(range(NUM_EPOCHS), train_accuracy_log, label='Train Accuracy', color='#97a6c4') + plt.plot(range(NUM_EPOCHS), val_accuracy_log, label='Validation Accuracy', color='#384860') + plt.title('Classification Accuracy over Epochs') + plt.xlabel('Epochs') + plt.ylabel('Accuracy') + plt.legend() + + plt.subplot(1, 3, 2) + plt.plot(range(NUM_EPOCHS), train_ROC_AUC_log, label='Train ROC AUC', color='#97a6c4') + plt.plot(range(NUM_EPOCHS), val_ROC_AUC_log, label='Validation ROC AUC', color='#384860') + plt.title('ROC AUC over Epochs') + plt.xlabel('Epochs') + plt.ylabel('ROC AUC') + plt.legend() + + plt.subplot(1, 3, 3) + plt.plot(range(NUM_EPOCHS), train_AP_score_log, label='Train AP Score', color='#97a6c4') + plt.plot(range(NUM_EPOCHS), val_AP_score_log, label='Validation AP Score', color='#384860') + plt.title('Average Precision Score over Epochs') + plt.xlabel('Epochs') + plt.ylabel('AP Score') + plt.legend() + + plt.tight_layout() + plt.savefig((Path(DATA_ROOT) / 'metrics_logs.png')) + plt.show() + plt.close() + + # --- Model evaluation --- + # Load best model + model.load_state_dict(torch.load((Path(DATA_ROOT) / 'best_siamese_model.pth'))) + + # get evaluation metrics + test_set_evaluation(model, test_samples, device, DATA_ROOT) + +if __name__ == "__main__": + main() \ No newline at end of file