MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs

MultiBLiMP is a massively multilingual benchmark of linguistic minimal pairs, covering 101 languages, 6 linguistic phenomena and containing more than 125,000 minimal pairs. This repository contains the code for creating the corpus and the scripts for LLM evaluation.

Dataset

The full MultiBLiMP dataset is available on HuggingFace.

A more detailed explanation of evaluating your own LM on MultiBLiMP is provided by Catherine Arnett (thanks!) in this repository: https://github.com/catherinearnett/multiblimp

Results

We provide a .csv dataframe of all model results here (759MB): Google Drive. Note that, to save disk space, this dataframe does not contain the original sentence pairs. In case you need those, you can download another .csv dataframe here (2.4GB): Google Drive.

The most important column in these dataframes is delta (log probability difference of the LM). Accuracy can be derived from this as well (delta > 0), or directly by taking the mean over the pred column. Specific tests should be easy to conduct using pandas groupby functionality.

Citation

The paper has been accepted into TACL and should be on MIT Press soon!

@misc{jumelet2025multiblimp10massivelymultilingual,
      title={MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs}, 
      author={Jaap Jumelet and Leonie Weissweiler and Arianna Bisazza},
      year={2025},
      eprint={2504.02768},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.02768}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
resources		resources
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
lm_eval_example.ipynb		lm_eval_example.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs

Dataset

Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jumelet/multiblimp

Folders and files

Latest commit

History

Repository files navigation

MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs

Dataset

Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages