AI-based Radiographic Lung Score Associates with Clinical Outcomes in Adults

Official implementation of "LungScore": An AI-based radiographic score of lung integrity applicable to all adults, including non-smokers and those without overt disease.

Repository Structure

This repository is structured as follows:

📂 LungScore

Folder stores the code used to train and test the pipeline.

📂 stats_analysis

Contains R scripts used to evaluate the association between the LungScore and clinical outcomes, and to export the plots in the Manuscript.

📂 config

Contains .yaml files that define all hyperparameters and paths, to reproduce the entire LungScore pipeline.

Run the model

To run Lung Score on you dataset

The model works on axial chest (LD)CT scans.

# Step 1: Install all our dependencies:
pip install LungScore --pre

# Step 2: Import Lung score functions
from LungScore.run import preprocess_nrrd, segment_lung, preprocess_lung, lungscore_load, lungscore_predict, predict_lungscore_riskcategory

# step 3: preprocess nrrd and segment the lung by passing nrrd_file_path --ex: nrrd_path="/mnt/data/123img.nrrd"
nrrd = preprocess_nrrd(nrrd_path)
lungmask = segment_lung(nrrd) 

# step 4: preprocess lung 
preprocessed_lung = preprocess_lung(lungmask, nrrd)

# step 5: load Lung Score model weights
model = lungscore_load()

# step 6: predict Lung Score (score from 0 t0 1 -- 1 is least impaired lung)
ai_lung_score = lungscore_predict(model, preprocessed_lung)

# step 7: predict risk group based on Lung Score splits (very low, low, moderate, high, very high)
risk_group = predict_lungscore_riskcategory(ai_lung_score)

# you can combine all in one step by:
from LungScore.run import AILungscorepredict
ai_lung_score, risk_group = AILungscorepredict(nrrd_path)

Lung Score Model

Model Development

The LungScore is a deep learning–derived biomarker designed to quantify structural lung integrity from chest CT scans. Lung Score pipeline consists of two stages:

Lung Segmentation: Automated delineation of the pulmonary parenchyma using the Lungmask.
Lung Score Quantification: The segmented 3D lung volume is input into a 3D Convolutional Neural Network (3D CNN). The model processes the entire pulmonary structure to output a continuous score (0 to 1), providing a global measure of lung integrity with lower values corresponding to relatively more impaired lung structure.

Click to view Model Training and Hypothesis

The LungScore was developed using a supervised 3D Convolutional Neural Network (3D CNN) trained on a curated subset of 4,228 NLST participants. In the absence of a definitive ground-truth measure for lung integrity, the training signal was derived from a hypothesized gradient of structural variation.

Hypothesis: The training cohort utilized participants at opposite ends of lung structural health (highest quartile of smoking exposure versus lowest quartile of smoking exposure with no CT chronic findings). Crucially, both groups consisted of smokers; this was intended to encourage the model to identify imaging-derived structural features rather than simply learning to distinguish smoking status, thereby potentially reducing confounding by smoking history.

Implementation Details: The 3D CNN was trained using a cross-entropy loss function with a batch size of 16 scans and a learning rate of 0.001. Training proceeded until the model checkpoint with the highest Area Under the Curve (AUC) on the tuning cohort was achieved.

By capturing these shared morphological signatures, the model potentially identifies a continuous gradient of lung integrity that may maintain its prognostic relevance across diverse clinical profiles—including individuals with varying smoking histories, non-smokers, and those both with and without overt lung disease.

Model Validation

The LungScore was validated by its ability to associate with clinical outcomes across two independent, large-scale cohorts:

Internal Validation: A held-out test set from the NLST (n = 15,733).
External Validation: The Framingham Heart Study (FHS) cohort (n = 2,581).

Outcome Association: To assess the score's prognostic relevance, we analyzed its association with all-cause mortality, as well as lung cancer and cardiovascular incidence and cause-specific mortality.

Stratification: Participants were categorized into five groups based on percentile cutoffs derived from the NLST tuning cohort:

Very Low: ≤5th percentile
Low: 5th–25th percentile
Moderate: 25th–50th percentile
High: 50th–75th percentile
Very High: ≥75th percentile

Statistical Analysis

All statistical modeling was performed using R version 4.2.2. The association between the LungScore and clinical outcomes was assessed through:

Survival Analysis: Kaplan–Meier estimates and log-rank tests were used for univariate comparisons.
Risk Modeling: Multivariable Cox proportional hazards models, adjusted for age, sex, BMI, smoking status, pack-years, and pre-existing comorbidities.
Proportional Hazards: Assumptions were verified using Schoenfeld residuals, with age- and sex-stratified models implemented to address non-proportionality.
Independence Testing: Multicollinearity was ruled out using Variance Inflation Factors (VIF < 5) and partial correlation analyses.

Datasets

The LungScore was trained on the National Lung Screening Trial (NLST) and tested on a held-out test set from NLST and an external dataset from the Framingham Heart Study (FHS). These datasets can be requested through official repositories as follows:

National Lung Screening Trial (NLST): NCI CDAS.
Framingham Heart Study (FHS): BioLINCC.

Image Pre-processing

Standardization of heterogeneous CT data was performed to ensure consistency across cohorts. The following pipeline was applied to baseline axial CT series from both NLST and FHS:

Series Selection: We prioritized series with a slice thickness closest to 2.5 mm (range: 0.625–3.27 mm). In cases of multiple candidates, the series with the softest reconstruction kernel was selected.
Format Conversion: DICOM files were converted to NRRD format using the SimpleITK library.
Resampling: All volumes were resampled to a fixed isotropic resolution of 0.68 x 0.68 x 2.5 mm³.

Data Split and Cohort Selection

Development Set (NLST, n = 4,228)
- Group 1 (More Impaired Lung): Highest quartile of smoking exposure (>66 pack-years).
- Group 2 (Less Impaired Lung): Lowest quartile of smoking exposure (<42 pack-years) with no chronic CT findings.
This subset was split 70:30 into training and tuning sets.
Internal Test Set (NLST, n = 15,733): A held-out portion reserved for independent validation.
External Validation (FHS, n = 2,581): An independent cohort used exclusively to evaluate model generalizability.

Environment Setup

This project was developed using Python 3.9. To ensure a stable and reproducible environment, we recommend using Conda to manage your virtual workspace and dependencies.

1. Install Conda

If you do not have Conda installed, please follow the official installation guide for your operating system: Conda Installation Guide.

2. Create and Activate the Environment

Run the following commands in your terminal to set up an isolated environment:

# Create a new environment named 'lungscore_env' with Python 3.9 conda create -n lungscore_env python=3.9 -y Activate the environment

conda activate lungscore_env

3. Install LungScore

Once your environment is active, install the LungScore package and its pre-requisite dependencies using the following command:

# Install the latest pre-release version of LungScore
pip install LungScore --pre

Replicating Lung Score Pipeline

The LungScore pipeline utilizes YAML configuration files to manage parameters and file paths. Each step requires a specific .yaml file to define hyperparameters and directory paths.

Step 1: Lung Segmentation

To generate the necessary input for training or inference, the lungs must first be delineated from the 3D CT volumes. Update path to point to your directory of resampled chest CTs (NRRD format) in the lung_segmentation_pipeline.yaml.

Config: config/lung_segmentation_pipeline.yaml
Action: Extracts the pulmonary parenchyma.

# Run lung extraction
python LungScore/preprocessing/extract_lung_pipeline.py

Step 2: Model Training

To reproduce the model development, use the training_pipeline.yaml. This file contains hyperparameters used to develop LungScore.

Config: config/training_pipeline.yaml
Action: Trains the 3D CNN on the segmented lungs.

# Run model training
python LungScore/training/training_pipeline.py

Step 3: Model Inference

Once the model is trained, use the testing_pipeline.yaml to apply the trained model weights to new scans. This script outputs a continuous score (0-1) reflecting the structural integrity of the segmented lung.

Config: config/testing_pipeline.yaml
Action: Generates individual LungScore.

# Run inference pipeline
python LungScore/inference/inference_pipeline.py

Disclaimer

The code and data of this repository are provided to promote reproducible research. They are not intended for clinical care or commercial use.

The software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.

Contact

We are happy to help you. Any question regarding this repository, please reach out to ahassan12@bwh.harvard.edu and haerts@bwh.harvard.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
LungScore		LungScore
config		config
figures		figures
stats_analysis		stats_analysis
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-based Radiographic Lung Score Associates with Clinical Outcomes in Adults

Repository Structure

Run the model

Lung Score Model

Model Development

Model Validation

Statistical Analysis

Datasets

Image Pre-processing

Data Split and Cohort Selection

Environment Setup

1. Install Conda

2. Create and Activate the Environment

Activate the environment

3. Install LungScore

Replicating Lung Score Pipeline

Step 1: Lung Segmentation

Step 2: Model Training

Step 3: Model Inference

Disclaimer

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-based Radiographic Lung Score Associates with Clinical Outcomes in Adults

Repository Structure

Run the model

Lung Score Model

Model Development

Model Validation

Statistical Analysis

Datasets

Image Pre-processing

Data Split and Cohort Selection

Environment Setup

1. Install Conda

2. Create and Activate the Environment

Activate the environment

3. Install LungScore

Replicating Lung Score Pipeline

Step 1: Lung Segmentation

Step 2: Model Training

Step 3: Model Inference

Disclaimer

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages