Skip to content

AIM-Harvard/LungScore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

196 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-based Radiographic Lung Score Associates with Clinical Outcomes in Adults

Official implementation of "LungScore": An AI-based radiographic score of lung integrity applicable to all adults, including non-smokers and those without overt disease.

Lung Score Overview

Repository Structure

This repository is structured as follows:

  • 📂 LungScore

    Folder stores the code used to train and test the pipeline.

  • 📂 stats_analysis

    Contains R scripts used to evaluate the association between the LungScore and clinical outcomes, and to export the plots in the Manuscript.

  • 📂 config

    Contains .yaml files that define all hyperparameters and paths, to reproduce the entire LungScore pipeline.

  • Run the model

    To run Lung Score on you dataset

    The model works on axial chest (LD)CT scans.

    # Step 1: Install all our dependencies:
    pip install LungScore --pre
    
    # Step 2: Import Lung score functions
    from LungScore.run import preprocess_nrrd, segment_lung, preprocess_lung, lungscore_load, lungscore_predict, predict_lungscore_riskcategory
    
    # step 3: preprocess nrrd and segment the lung by passing nrrd_file_path --ex: nrrd_path="/mnt/data/123img.nrrd"
    nrrd = preprocess_nrrd(nrrd_path)
    lungmask = segment_lung(nrrd) 
    
    # step 4: preprocess lung 
    preprocessed_lung = preprocess_lung(lungmask, nrrd)
    
    # step 5: load Lung Score model weights
    model = lungscore_load()
    
    # step 6: predict Lung Score (score from 0 t0 1 -- 1 is least impaired lung)
    ai_lung_score = lungscore_predict(model, preprocessed_lung)
    
    # step 7: predict risk group based on Lung Score splits (very low, low, moderate, high, very high)
    risk_group = predict_lungscore_riskcategory(ai_lung_score)
    
    # you can combine all in one step by:
    from LungScore.run import AILungscorepredict
    ai_lung_score, risk_group = AILungscorepredict(nrrd_path)
    

    Lung Score Model

    Model Development

    The LungScore is a deep learning–derived biomarker designed to quantify structural lung integrity from chest CT scans. Lung Score pipeline consists of two stages:

    1. Lung Segmentation: Automated delineation of the pulmonary parenchyma using the Lungmask.
    2. Lung Score Quantification: The segmented 3D lung volume is input into a 3D Convolutional Neural Network (3D CNN). The model processes the entire pulmonary structure to output a continuous score (0 to 1), providing a global measure of lung integrity with lower values corresponding to relatively more impaired lung structure.
    Click to view Model Training and Hypothesis

    The LungScore was developed using a supervised 3D Convolutional Neural Network (3D CNN) trained on a curated subset of 4,228 NLST participants. In the absence of a definitive ground-truth measure for lung integrity, the training signal was derived from a hypothesized gradient of structural variation.

    Hypothesis: The training cohort utilized participants at opposite ends of lung structural health (highest quartile of smoking exposure versus lowest quartile of smoking exposure with no CT chronic findings). Crucially, both groups consisted of smokers; this was intended to encourage the model to identify imaging-derived structural features rather than simply learning to distinguish smoking status, thereby potentially reducing confounding by smoking history.

    Implementation Details: The 3D CNN was trained using a cross-entropy loss function with a batch size of 16 scans and a learning rate of 0.001. Training proceeded until the model checkpoint with the highest Area Under the Curve (AUC) on the tuning cohort was achieved.

    By capturing these shared morphological signatures, the model potentially identifies a continuous gradient of lung integrity that may maintain its prognostic relevance across diverse clinical profiles—including individuals with varying smoking histories, non-smokers, and those both with and without overt lung disease.

    Lung Score Pipeline

    Model Validation

    The LungScore was validated by its ability to associate with clinical outcomes across two independent, large-scale cohorts:

    • Internal Validation: A held-out test set from the NLST (n = 15,733).
    • External Validation: The Framingham Heart Study (FHS) cohort (n = 2,581).

    Outcome Association: To assess the score's prognostic relevance, we analyzed its association with all-cause mortality, as well as lung cancer and cardiovascular incidence and cause-specific mortality.

    Stratification: Participants were categorized into five groups based on percentile cutoffs derived from the NLST tuning cohort:

    • Very Low: ≤5th percentile
    • Low: 5th–25th percentile
    • Moderate: 25th–50th percentile
    • High: 50th–75th percentile
    • Very High: ≥75th percentile

    Statistical Analysis

    All statistical modeling was performed using R version 4.2.2. The association between the LungScore and clinical outcomes was assessed through:

    • Survival Analysis: Kaplan–Meier estimates and log-rank tests were used for univariate comparisons.
    • Risk Modeling: Multivariable Cox proportional hazards models, adjusted for age, sex, BMI, smoking status, pack-years, and pre-existing comorbidities.
    • Proportional Hazards: Assumptions were verified using Schoenfeld residuals, with age- and sex-stratified models implemented to address non-proportionality.
    • Independence Testing: Multicollinearity was ruled out using Variance Inflation Factors (VIF < 5) and partial correlation analyses.

    Datasets

    The LungScore was trained on the National Lung Screening Trial (NLST) and tested on a held-out test set from NLST and an external dataset from the Framingham Heart Study (FHS). These datasets can be requested through official repositories as follows:

    • National Lung Screening Trial (NLST): NCI CDAS.
    • Framingham Heart Study (FHS): BioLINCC.

    Image Pre-processing

    Standardization of heterogeneous CT data was performed to ensure consistency across cohorts. The following pipeline was applied to baseline axial CT series from both NLST and FHS:

    • Series Selection: We prioritized series with a slice thickness closest to 2.5 mm (range: 0.625–3.27 mm). In cases of multiple candidates, the series with the softest reconstruction kernel was selected.
    • Format Conversion: DICOM files were converted to NRRD format using the SimpleITK library.
    • Resampling: All volumes were resampled to a fixed isotropic resolution of 0.68 x 0.68 x 2.5 mm³.

    Data Split and Cohort Selection

    • Development Set (NLST, n = 4,228)
      • Group 1 (More Impaired Lung): Highest quartile of smoking exposure (>66 pack-years).
      • Group 2 (Less Impaired Lung): Lowest quartile of smoking exposure (<42 pack-years) with no chronic CT findings.
      This subset was split 70:30 into training and tuning sets.
    • Internal Test Set (NLST, n = 15,733): A held-out portion reserved for independent validation.
    • External Validation (FHS, n = 2,581): An independent cohort used exclusively to evaluate model generalizability.

    Environment Setup

    This project was developed using Python 3.9. To ensure a stable and reproducible environment, we recommend using Conda to manage your virtual workspace and dependencies.

    1. Install Conda

    If you do not have Conda installed, please follow the official installation guide for your operating system: Conda Installation Guide.

    2. Create and Activate the Environment

    Run the following commands in your terminal to set up an isolated environment:

    # Create a new environment named 'lungscore_env' with Python 3.9
    conda create -n lungscore_env python=3.9 -y
    

    Activate the environment

    conda activate lungscore_env

    3. Install LungScore

    Once your environment is active, install the LungScore package and its pre-requisite dependencies using the following command:

    # Install the latest pre-release version of LungScore
    pip install LungScore --pre
    

    Replicating Lung Score Pipeline

    The LungScore pipeline utilizes YAML configuration files to manage parameters and file paths. Each step requires a specific .yaml file to define hyperparameters and directory paths.

    Step 1: Lung Segmentation

    To generate the necessary input for training or inference, the lungs must first be delineated from the 3D CT volumes. Update path to point to your directory of resampled chest CTs (NRRD format) in the lung_segmentation_pipeline.yaml.

    • Config: config/lung_segmentation_pipeline.yaml
    • Action: Extracts the pulmonary parenchyma.
    # Run lung extraction
    python LungScore/preprocessing/extract_lung_pipeline.py 
    

    Step 2: Model Training

    To reproduce the model development, use the training_pipeline.yaml. This file contains hyperparameters used to develop LungScore.

    • Config: config/training_pipeline.yaml
    • Action: Trains the 3D CNN on the segmented lungs.
    # Run model training
    python LungScore/training/training_pipeline.py 
    

    Step 3: Model Inference

    Once the model is trained, use the testing_pipeline.yaml to apply the trained model weights to new scans. This script outputs a continuous score (0-1) reflecting the structural integrity of the segmented lung.

    • Config: config/testing_pipeline.yaml
    • Action: Generates individual LungScore.
    # Run inference pipeline
    python LungScore/inference/inference_pipeline.py 
    

    Disclaimer

    The code and data of this repository are provided to promote reproducible research. They are not intended for clinical care or commercial use.

    The software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.

    Contact

    We are happy to help you. Any question regarding this repository, please reach out to ahassan12@bwh.harvard.edu and haerts@bwh.harvard.edu.

    About

    AI-based Radiographic Lung Score Associates with Clinical Outcomes in Adults

    Resources

    License

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

     
     
     

    Contributors