Skip to content

jumelet/multiblimp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs

arXiv

MultiBLiMP is a massively multilingual benchmark of linguistic minimal pairs, covering 101 languages, 6 linguistic phenomena and containing more than 125,000 minimal pairs. This repository contains the code for creating the corpus and the scripts for LLM evaluation.

Dataset

The full MultiBLiMP dataset is available on HuggingFace.

A more detailed explanation of evaluating your own LM on MultiBLiMP is provided by Catherine Arnett (thanks!) in this repository: https://github.com/catherinearnett/multiblimp

Results

We provide a .csv dataframe of all model results here (759MB): Google Drive. Note that, to save disk space, this dataframe does not contain the original sentence pairs. In case you need those, you can download another .csv dataframe here (2.4GB): Google Drive.

The most important column in these dataframes is delta (log probability difference of the LM). Accuracy can be derived from this as well (delta > 0), or directly by taking the mean over the pred column. Specific tests should be easy to conduct using pandas groupby functionality.

Citation

The paper has been accepted into TACL and should be on MIT Press soon!

@misc{jumelet2025multiblimp10massivelymultilingual,
      title={MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs}, 
      author={Jaap Jumelet and Leonie Weissweiler and Arianna Bisazza},
      year={2025},
      eprint={2504.02768},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.02768}, 
}

About

Pipeline and eval for MultiBLiMP

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published