DaneshjouLab · Vicbi · Oct 28, 2025 · Jul 11, 2025 · Jul 11, 2025 · Jul 11, 2025
diff --git a/.DS_Store b/.DS_Store
diff --git a/.github/workflows/build-and-test.yml b/.github/workflows/build-and-test.yml
@@ -0,0 +1,45 @@
+#
+# This source file is part of the  Daneshjou Lab projects
+#
+# SPDX-FileCopyrightText: 2025 Stanford University and the project authors (see CONTRIBUTORS.md)
+#
+# SPDX-License-Identifier: MIT
+#
+
+name: Build and Test
+
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+  workflow_dispatch:
+  workflow_call:
+
+jobs:
+  pylint:
+    name: PyLint
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.10", "3.12"]
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+      - name: Install Infrastructure
+        run: |
+          pip install -r requirements.txt
+          pip install pylint
+      - name: Analysing the code with pylint
+        run: |
+          pylint $(git ls-files '*.py')
+  black_lint:
+    name: Black Code Formatter Check
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+      - name: Install Black
+        run: pip install black[jupyter]
+      - name: Check code formatting with Black
+        run: black . --exclude '\.ipynb$'
diff --git a/.github/workflows/pull_request.yml b/.github/workflows/pull_request.yml
@@ -0,0 +1,37 @@
+#
+# This source file is part of the Daneshjou Lab projects
+#
+# SPDX-FileCopyrightText: 2025 Stanford University and the project authors (see CONTRIBUTORS.md)
+#
+# SPDX-License-Identifier: MIT
+#
+
+name: Pull Request
+
+on:
+  pull_request:
+  workflow_dispatch:
+
+jobs:
+  reuse_action:
+    name: REUSE Compliance Check
+    uses: DaneshjouLab/.github/.github/workflows/reuse.yml@main
+  markdown_link_check:
+    name: Markdown Link Check
+    uses: DaneshjouLab/.github/.github/workflows/markdown-link-check.yml@main
+  yamllint:
+    name: YAML Lint Check
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v2
+
+      - name: Set up Python
+        uses: actions/setup-python@v2
+
+      - name: Install yamllint
+        run: pip install yamllint
+
+      - name: Run yamllint with custom config
+        run: yamllint -c .yamllint .github/workflows/*.yml
diff --git a/.gitignore b/.gitignore
@@ -1,7 +1,14 @@
+# This source file is part of the Daneshjou Lab projects
+#
+# SPDX-FileCopyrightText: 2025 Stanford University and the project authors (see AUTHORS.md)
+#
+# SPDX-License-Identifier: MIT
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[codz]
 *$py.class
+.python-version
 
 # C extensions
 *.so
@@ -206,5 +213,4 @@ marimo/_static/
 marimo/_lsp/
 __marimo__/
 
-
-.DS_Store
+**/.DS_Store
diff --git a/.reuse/dep5.txt b/.reuse/dep5.txt
@@ -0,0 +1,11 @@
+Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+
+Files: media/*.png
+Copyright: 2025 Stanford University and the project authors (see CONTRIBUTORS.md)
+License: MIT
+Comment: All files are part of the Daneshjou Lab projects.
+
+Files: results/*.json
+Copyright: 2025 Stanford University and the project authors (see CONTRIBUTORS.md)
+License: MIT
+Comment: All files are part of the Daneshjou Lab projects.
diff --git a/.yamllint b/.yamllint
@@ -0,0 +1,13 @@
+---
+extends: default
+
+rules:
+  truthy:
+    level: warning
+    allowed-values: ["false", "true", "on", "off"]
+  document-start:
+    level: warning 
+    present: false
+  line-length:
+    max: 180
+    level: warning
diff --git a/.yamllint.license b/.yamllint.license
@@ -0,0 +1,5 @@
+# This source file is part of the ARPA-H CARE LLM project
+#
+# SPDX-FileCopyrightText: 2025 Stanford University and the project authors (see AUTHORS.md)
+#
+# SPDX-License-Identifier: MIT
diff --git a/LICENSES/MIT.txt b/LICENSES/MIT.txt
@@ -0,0 +1,18 @@
+MIT License
+
+Copyright (c) <year> <copyright holders>
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
+associated documentation files (the "Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the
+following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
+LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
+EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/README.md b/README.md
@@ -1,8 +1,14 @@
+<!-- This source file is part of the DaneshjouLab projects
+
+SPDX-FileCopyrightText: 2025 Stanford University
+
+SPDX-License-Identifier: MIT
+-->
+
 # Finetuning Pretrained Models for Compressed Dermatology Image Analysis
 
 This project explores how compressed and degraded dermatology images (from the ISIC 2019 dataset) affect classification performance using pretrained vision models. It compares fine-tuning vs. linear probing across multiple JPEG quality levels.
 
-![System architecture diagram](<./CS231N Poster.png>)
 
 ## Project Goals
 
@@ -30,6 +36,14 @@ This project explores how compressed and degraded dermatology images (from the I
 reduced-perception/
 ├── configs/
 │   └── example_config.yaml          # Configs for job submissions
+compressed-perception/
+├── README.md                    # Project overview & documentation
+├── LICENSES/                    # Directory containing license files (REUSE compliance)
+│   └── MIT.txt                  # MIT license text
+│
+├── pyproject.toml               # Python packaging config
+├── setup.py                     # Installation script for the package
+├── setup.cfg                    # Configuration for setup tools
 │
 ├── scripts/                         # Lightweight utility or shell scripts
 │   ├── download_unpack_isic2019.sh  # Downloads and unpacks ISIC data
@@ -74,6 +88,101 @@ reduced-perception/
 
 3. View results
    We use weights and biases for logging, so output plots can be seen there
+├── requirements.txt             # Dependencies file
+├── requirements.txt.license     # Dependencies file license
+├── .yamllint                    # YAML linter configuration
+├── .yamllint.license            # YAML linter configuration license
+│
+├── .github/                     # GitHub specific files
+│   └── workflows/               # CI/CD workflow definitions
+│       ├── build-and-test.yml
+│       └── pull_request.yml
+│
+├── .reuse/                      # REUSE compliance configuration
+│   └── dep5                     # Copyright and license information
+│
+├── docs/                        # Documentation
+│   └── pipeline.md              # Pipeline documentation
+│
+├── scripts/                        # Standalone scripts
+│   ├── ...
+│   └── visualize_isic_results.py   # Visualize metrics for model comparison (TODO)
+│
+├── configs/                     # Configuration files (TODO)
+│   ├── datasets/                # Dataset configs
+│   │   └── isic2019.yaml        # ISIC 2019 dataset config
+│   ├── models/                  # Model configs
+│   │   ├── vit.yaml             # ViT model config
+│   │   ├── dinov2.yaml          # DINOv2 model config
+│   │   └── simclr.yaml          # SimCLR model config
+│   ├── experiments/             # Experiment configs
+│   │   ├── baseline.yaml        # Baseline experiment
+│   │   └── lr_sweep.yaml        # Learning rate sweep experiment
+│   └── example_config.yaml      # Example configuration file
+│
+│
+├── tests/                   # Test suite (TODO)
+│   ├── unit/                # Unit tests
+│   │   └── test_transforms.py
+│   ├── integration/         # Integration tests
+│   │   └── test_pipeline.py
+│   └── conftest.py          # Test fixtures and configuration
+│
+├── src/
+│   └── compressed_perception/ # Main package
+│       ├── __init__.py      # Package initialization
+│       │
+│       ├── models/          # Model implementations
+│       │   ├── __init__.py
+│       │   ├── architectures/   # Model architecture definitions
+│       │   │   ├── __init__.py
+│       │   │   ├── vit.py       # Vision Transformer adaptations
+│       │   │   └── simclr.py    # SimCLR adaptations
+│       │   │
+│       │   ├── evaluation/      # Model evaluation code
+│       │   │   ├── __init__.py
+│       │   │   └── metrics.py   # Evaluation metrics
+│       │   │
+│       │   ├── comparison/      # Model comparison utilities
+│       │   │   ├── __init__.py
+│       │   │   ├── compare_baseline.py  # Baseline comparison
+│       │   │   └── compare_lr_sweep.py  # Learning rate sweeping
+│       │   │
+│       │   ├── training/        # Training infrastructure
+│       │   │   ├── __init__.py
+│       │   │   ├── trainers.py      # Training loops
+│       │   │   └── callbacks.py     # Training callbacks
+│       │   │
+│       │   └── utils/           # Model utilities
+│       │       ├── __init__.py
+│       │       ├── constants.py # Model constants
+│       │       └── helpers.py   # Helper functions
+│       │
+│       ├── modules/          # Reusable modules
+│       │   ├── __init__.py
+│       │   ├── transforms/      # Image transformations
+│       │   │   ├── __init__.py
+│       │   │   ├── degradation.py  # Image degradation transforms
+│       │   │   └── augmentation.py # Data augmentation transforms
+│       │   │
+│       │   └── data_preparation/ # Data preparation utilities
+│       │       ├── __init__.py
+│       │       └── preparation.py  # Dataset preparation
+│       │
+│
+│
+├── results/
+│
+├── jobs/                         # Cluster job submission files
+│   ├── job_template.slurm        # SLURM job template
+│   ├── run.sh                    # General run script
+│   ├── rurun_compare_baseline.sh # Learning rate experiment script
+│   ├── run_compare_lr_sweep.sh   # Model comparison script
+│   └── configs/             # Job configurations
+│
+└── media/                   # Media files for documentation
+    └── CS231N Poster.png    # Project poster
+```
 
 ## 📦 Dataset
 

diff --git a/docs/pipeline.md b/docs/pipeline.md
@@ -0,0 +1,72 @@
+# This source file is part of the Daneshjou Lab projects
+#
+# SPDX-FileCopyrightText: 2025 Stanford University and the project authors (see AUTHORS.md)
+#
+# SPDX-License-Identifier: MIT
+
+# Model Comparison Pipeline
+
+## Overview
+
+This script (`model_comparison_models.py`) provides a baseline for comparing different image classification models at various image compression levels, including the original images. It supports fine-tuning, linear probing, and optional image degradation transforms.
+
+---
+
+## Features
+
+- **Model Support:** Vision Transformer (ViT), DINOv2, SimCLR (self-supervised backbone)
+- **Data Augmentation:** Optional JPEG compression, Gaussian blur, and color quantization
+- **Dataset Balancing:** Ensures equal samples per class for fair comparison
+- **Training & Evaluation:** Uses Hugging Face Trainer for streamlined workflows
+- **Experiment Tracking:** Integrated with Weights & Biases (`wandb`)
+- **GPU Monitoring:** Optional support via `pynvml`
+
+---
+
+## Workflow
+
+1. **Environment Setup**
+   - Loads required libraries and sets up cache directories.
+   - Checks for GPU availability.
+
+2. **Dataset Loading & Balancing**
+   - Loads ISIC_2019_224 dataset.
+   - Balances the dataset across filtered classes.
+
+3. **Model Initialization**
+   - Initializes model and preprocessor based on configuration.
+
+4. **Preprocessing & Augmentation**
+   - Applies resizing, normalization, and optional degradation transforms.
+
+5. **Training & Evaluation**
+   - Splits data into training and validation sets.
+   - Trains and evaluates each model, logging results to `wandb`.
+
+---
+
+## Usage
+
+```bash
+python [model_comparison_models.py](http://_vscodecontentref_/2) --resolution 224 --batch_size 256 --num_train_images 25000 --num_epochs 10 --eval_steps 10
+```
+
+## Configuration
+- Models: Edit the models list in the script to add or modify model configurations.
+- Transforms: Toggle apply_transforms in prepare_datasets() to enable/disable augmentations.
+- Hyperparameters: Adjust arguments in the main() function for batch size, epochs, etc.
+
+## Output
+- Training and evaluation metrics are printed to the console and logged to Weights & Biases.
+- Results can be used for further analysis or ablation studies.
+
+## Extending
+- Add new models by updating the models list.
+- Implement new transforms in utils/transforms.py.
+- Add new datasets by modifying the dataset loading logic.
+
+## References
+- Hugging Face Transformers
+- PyTorch
+- Weights & Biases
+- ISIC 2019 Dataset