CheXmix is an early-fusion multi-modal chest x-ray vision-language model capable of fine-grained discriminative and generative tasks. (CVPR Findings 2026).
For an editable installation, use the following commands to clone and install this repository.
conda create --name chexmix python==3.10
conda activate chexmix
git clone https://github.com/StanfordMIMI/CheXmix
cd chexmix
pip install -e .To create a CheXmix model with generative capabilities enabled by default, use the following:
from chexmix import CheXmix
model = CheXmix()To instantiate a model that outputs image embeddings layer by layer, use:
from chexmix import CheXmix
model = CheXmix(ImageEmbeddings=True)To instantiate a model for report generation, use:
from chexmix import CheXmix
model = CheXmix(ReportGeneration=True)For inference on a demo chest x-ray, please check out the general demo.
For additional information, please read the inference documentation.
If you find this repository useful for your work, please cite the cite the paper:
@inproceedings{kumar2026chexmix,
author = {Kumar, Ashwin and Holland, Robbie and Barrett, Corey and Kim, Jangwon and Varma, Maya and Chen, Zhihong and Gao, Yunhe and Zaharchuk, Greg and Taghavi, Tara and Kenthapadi, Krishnaram and Chaudhari, Akshay},
title = {CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
pages = {9466--9476},
year = {2026},
note = {arXiv preprint arXiv:2604.22989}
}