This repo contains 25 image-question pairs taken from the medical domain. It is intended to test multimodal capabilities, with the questions prefiltered to be impossible to answer without accurate understanding of visual contents.
The .csv file contains the questions, answer choices formated as a list of strings, the correct answer as well as a reference to the image_nr.
The data was manually compiled by me and, while aimed to be representative of data that may be encountered as part of a medical licensing exam, may be inaccurate or outdated. Use is for benchmarking purposes only.