Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
66d27a4
code refactoring and removal of deprecated files
Vicbi Jul 11, 2025
a241827
clean model scripts and renaming
Vicbi Jul 11, 2025
665f18a
major refactoring of model comparison scripts
Vicbi Jul 11, 2025
424e06b
major refactoring of model comparison scripts
Vicbi Jul 11, 2025
ef80a97
pylint fix + license update
Vicbi Jul 11, 2025
903551a
pylint fix + license update
Vicbi Jul 11, 2025
fc54bd0
Add .yamllint file
Vicbi Jul 11, 2025
86da609
pylint
Vicbi Jul 11, 2025
f0839d2
refactor and restructure
Vicbi Jul 12, 2025
68b39e3
refactor and restructure
Vicbi Jul 12, 2025
83908e3
refactor
Vicbi Jul 12, 2025
ebe690e
update README.md
Vicbi Jul 12, 2025
0a20004
correct bash script
Vicbi Jul 12, 2025
f9a82de
remove duplicate from .yamllint file
Vicbi Jul 12, 2025
7324dd7
restore history
Vicbi Oct 18, 2025
1343fbc
resolve conflicts
Vicbi Oct 18, 2025
ed69142
resolve conflicts
Vicbi Oct 18, 2025
ac08385
demo for transformation implementation for ISIC
Vicbi Oct 18, 2025
f478460
update demo
Vicbi Oct 18, 2025
6cea147
add license
Vicbi Oct 18, 2025
9f869d5
add license
Vicbi Oct 18, 2025
2e313ec
add demo
Vicbi Oct 20, 2025
7951ce1
implement training task abstraction and wrappers level for training
Vicbi Oct 21, 2025
079984f
update hydra-core version
Vicbi Oct 21, 2025
55776f7
refactor(data): consolidate dataset logic into dataset_factory and im…
Vicbi Oct 22, 2025
b59150d
docs(train): document support for Hydra-based hyperparameter sweeps -…
Vicbi Oct 22, 2025
681986c
- Updated trial.py to load ISICBaseDataset without transform for corr…
Vicbi Oct 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed .DS_Store
Binary file not shown.
45 changes: 45 additions & 0 deletions .github/workflows/build-and-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#
# This source file is part of the Daneshjou Lab projects
#
# SPDX-FileCopyrightText: 2025 Stanford University and the project authors (see CONTRIBUTORS.md)
#
# SPDX-License-Identifier: MIT
#

name: Build and Test

on:
push:
branches:
- main
pull_request:
workflow_dispatch:
workflow_call:

jobs:
pylint:
name: PyLint
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.12"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
- name: Install Infrastructure
run: |
pip install -r requirements.txt
pip install pylint
- name: Analysing the code with pylint
run: |
pylint $(git ls-files '*.py')
black_lint:
name: Black Code Formatter Check
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
- name: Install Black
run: pip install black[jupyter]
- name: Check code formatting with Black
run: black . --exclude '\.ipynb$'
37 changes: 37 additions & 0 deletions .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#
# This source file is part of the Daneshjou Lab projects
#
# SPDX-FileCopyrightText: 2025 Stanford University and the project authors (see CONTRIBUTORS.md)
#
# SPDX-License-Identifier: MIT
#

name: Pull Request

on:
pull_request:
workflow_dispatch:

jobs:
reuse_action:
name: REUSE Compliance Check
uses: DaneshjouLab/.github/.github/workflows/reuse.yml@main
markdown_link_check:
name: Markdown Link Check
uses: DaneshjouLab/.github/.github/workflows/markdown-link-check.yml@main
yamllint:
name: YAML Lint Check
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2

- name: Install yamllint
run: pip install yamllint

- name: Run yamllint with custom config
run: yamllint -c .yamllint .github/workflows/*.yml
10 changes: 8 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
# This source file is part of the Daneshjou Lab projects
#
# SPDX-FileCopyrightText: 2025 Stanford University and the project authors (see AUTHORS.md)
#
# SPDX-License-Identifier: MIT

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[codz]
*$py.class
.python-version

# C extensions
*.so
Expand Down Expand Up @@ -206,5 +213,4 @@ marimo/_static/
marimo/_lsp/
__marimo__/


.DS_Store
**/.DS_Store
11 changes: 11 additions & 0 deletions .reuse/dep5.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/

Files: media/*.png
Copyright: 2025 Stanford University and the project authors (see CONTRIBUTORS.md)
License: MIT
Comment: All files are part of the Daneshjou Lab projects.

Files: results/*.json
Copyright: 2025 Stanford University and the project authors (see CONTRIBUTORS.md)
License: MIT
Comment: All files are part of the Daneshjou Lab projects.
13 changes: 13 additions & 0 deletions .yamllint
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
extends: default

rules:
truthy:
level: warning
allowed-values: ["false", "true", "on", "off"]
document-start:
level: warning
present: false
line-length:
max: 180
level: warning
5 changes: 5 additions & 0 deletions .yamllint.license
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# This source file is part of the ARPA-H CARE LLM project
#
# SPDX-FileCopyrightText: 2025 Stanford University and the project authors (see AUTHORS.md)
#
# SPDX-License-Identifier: MIT
18 changes: 18 additions & 0 deletions LICENSES/MIT.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
MIT License

Copyright (c) <year> <copyright holders>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
associated documentation files (the "Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the
following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial
portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
USE OR OTHER DEALINGS IN THE SOFTWARE.
111 changes: 110 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
<!-- This source file is part of the DaneshjouLab projects

SPDX-FileCopyrightText: 2025 Stanford University

SPDX-License-Identifier: MIT
-->

# Finetuning Pretrained Models for Compressed Dermatology Image Analysis

This project explores how compressed and degraded dermatology images (from the ISIC 2019 dataset) affect classification performance using pretrained vision models. It compares fine-tuning vs. linear probing across multiple JPEG quality levels.

![System architecture diagram](<./CS231N Poster.png>)

## Project Goals

Expand Down Expand Up @@ -30,6 +36,14 @@ This project explores how compressed and degraded dermatology images (from the I
reduced-perception/
├── configs/
│ └── example_config.yaml # Configs for job submissions
compressed-perception/
├── README.md # Project overview & documentation
├── LICENSES/ # Directory containing license files (REUSE compliance)
│ └── MIT.txt # MIT license text
├── pyproject.toml # Python packaging config
├── setup.py # Installation script for the package
├── setup.cfg # Configuration for setup tools
├── scripts/ # Lightweight utility or shell scripts
│ ├── download_unpack_isic2019.sh # Downloads and unpacks ISIC data
Expand Down Expand Up @@ -74,6 +88,101 @@ reduced-perception/

3. View results
We use weights and biases for logging, so output plots can be seen there
├── requirements.txt # Dependencies file
├── requirements.txt.license # Dependencies file license
├── .yamllint # YAML linter configuration
├── .yamllint.license # YAML linter configuration license
├── .github/ # GitHub specific files
│ └── workflows/ # CI/CD workflow definitions
│ ├── build-and-test.yml
│ └── pull_request.yml
├── .reuse/ # REUSE compliance configuration
│ └── dep5 # Copyright and license information
├── docs/ # Documentation
│ └── pipeline.md # Pipeline documentation
├── scripts/ # Standalone scripts
│ ├── ...
│ └── visualize_isic_results.py # Visualize metrics for model comparison (TODO)
├── configs/ # Configuration files (TODO)
│ ├── datasets/ # Dataset configs
│ │ └── isic2019.yaml # ISIC 2019 dataset config
│ ├── models/ # Model configs
│ │ ├── vit.yaml # ViT model config
│ │ ├── dinov2.yaml # DINOv2 model config
│ │ └── simclr.yaml # SimCLR model config
│ ├── experiments/ # Experiment configs
│ │ ├── baseline.yaml # Baseline experiment
│ │ └── lr_sweep.yaml # Learning rate sweep experiment
│ └── example_config.yaml # Example configuration file
├── tests/ # Test suite (TODO)
│ ├── unit/ # Unit tests
│ │ └── test_transforms.py
│ ├── integration/ # Integration tests
│ │ └── test_pipeline.py
│ └── conftest.py # Test fixtures and configuration
├── src/
│ └── compressed_perception/ # Main package
│ ├── __init__.py # Package initialization
│ │
│ ├── models/ # Model implementations
│ │ ├── __init__.py
│ │ ├── architectures/ # Model architecture definitions
│ │ │ ├── __init__.py
│ │ │ ├── vit.py # Vision Transformer adaptations
│ │ │ └── simclr.py # SimCLR adaptations
│ │ │
│ │ ├── evaluation/ # Model evaluation code
│ │ │ ├── __init__.py
│ │ │ └── metrics.py # Evaluation metrics
│ │ │
│ │ ├── comparison/ # Model comparison utilities
│ │ │ ├── __init__.py
│ │ │ ├── compare_baseline.py # Baseline comparison
│ │ │ └── compare_lr_sweep.py # Learning rate sweeping
│ │ │
│ │ ├── training/ # Training infrastructure
│ │ │ ├── __init__.py
│ │ │ ├── trainers.py # Training loops
│ │ │ └── callbacks.py # Training callbacks
│ │ │
│ │ └── utils/ # Model utilities
│ │ ├── __init__.py
│ │ ├── constants.py # Model constants
│ │ └── helpers.py # Helper functions
│ │
│ ├── modules/ # Reusable modules
│ │ ├── __init__.py
│ │ ├── transforms/ # Image transformations
│ │ │ ├── __init__.py
│ │ │ ├── degradation.py # Image degradation transforms
│ │ │ └── augmentation.py # Data augmentation transforms
│ │ │
│ │ └── data_preparation/ # Data preparation utilities
│ │ ├── __init__.py
│ │ └── preparation.py # Dataset preparation
│ │
├── results/
├── jobs/ # Cluster job submission files
│ ├── job_template.slurm # SLURM job template
│ ├── run.sh # General run script
│ ├── rurun_compare_baseline.sh # Learning rate experiment script
│ ├── run_compare_lr_sweep.sh # Model comparison script
│ └── configs/ # Job configurations
└── media/ # Media files for documentation
└── CS231N Poster.png # Project poster
```

## 📦 Dataset

Expand Down
72 changes: 72 additions & 0 deletions docs/pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# This source file is part of the Daneshjou Lab projects
#
# SPDX-FileCopyrightText: 2025 Stanford University and the project authors (see AUTHORS.md)
#
# SPDX-License-Identifier: MIT

# Model Comparison Pipeline

## Overview

This script (`model_comparison_models.py`) provides a baseline for comparing different image classification models at various image compression levels, including the original images. It supports fine-tuning, linear probing, and optional image degradation transforms.

---

## Features

- **Model Support:** Vision Transformer (ViT), DINOv2, SimCLR (self-supervised backbone)
- **Data Augmentation:** Optional JPEG compression, Gaussian blur, and color quantization
- **Dataset Balancing:** Ensures equal samples per class for fair comparison
- **Training & Evaluation:** Uses Hugging Face Trainer for streamlined workflows
- **Experiment Tracking:** Integrated with Weights & Biases (`wandb`)
- **GPU Monitoring:** Optional support via `pynvml`

---

## Workflow

1. **Environment Setup**
- Loads required libraries and sets up cache directories.
- Checks for GPU availability.

2. **Dataset Loading & Balancing**
- Loads ISIC_2019_224 dataset.
- Balances the dataset across filtered classes.

3. **Model Initialization**
- Initializes model and preprocessor based on configuration.

4. **Preprocessing & Augmentation**
- Applies resizing, normalization, and optional degradation transforms.

5. **Training & Evaluation**
- Splits data into training and validation sets.
- Trains and evaluates each model, logging results to `wandb`.

---

## Usage

```bash
python [model_comparison_models.py](http://_vscodecontentref_/2) --resolution 224 --batch_size 256 --num_train_images 25000 --num_epochs 10 --eval_steps 10
```

## Configuration
- Models: Edit the models list in the script to add or modify model configurations.
- Transforms: Toggle apply_transforms in prepare_datasets() to enable/disable augmentations.
- Hyperparameters: Adjust arguments in the main() function for batch size, epochs, etc.

## Output
- Training and evaluation metrics are printed to the console and logged to Weights & Biases.
- Results can be used for further analysis or ablation studies.

## Extending
- Add new models by updating the models list.
- Implement new transforms in utils/transforms.py.
- Add new datasets by modifying the dataset loading logic.

## References
- Hugging Face Transformers
- PyTorch
- Weights & Biases
- ISIC 2019 Dataset
Loading
Loading