Skip to content

reAIM-Lab/EHR-missingness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mind the data gap: Missingness Still Shapes Large Language Model Prognoses

This repository allows reproducing the results presented in Mind the data gap: Missingness Still Shapes Large Language Model Prognoses. In this work, we investigate the impact of missingness serialization on the zero-shot performance of LLMs.

Experimental setup

The proposed experiments consist of providing clinical data as inputs and prompting two LLMs (Qwen 3 and OSS-GPT) to predict an outcome of interest. To measure the impact of missingness, we employ two strategies to serialize the data: with and without missingness indicators in the serialized input.

To reproduce the paper's results:

  1. Generate MIMIC-IV MEDS build and task cohorts
    Use the following tutorial to construct the MIMIC-IV MEDS build and downstream task cohorts.
    Follow the instructions in that repository to create the required inputs.

  2. Create the final evaluation cohort

    This step extracts the clinical measurements and formats them into the final evaluation cohort used for inference. From this repository, run:

    python main.py --experiment mimic --mode generate_cohort
    
  3. Run inference

    Generate LLM predictions by running:

    python main.py --experiment mimic --mode test
    

Requirements

  • Python 3.11
  • vLLM for efficient inference.

To install with conda:

conda env create -f environment.yml
conda activate vllm_env

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages