This is a repository containing code and data for the paper:
K. Noorbakhsh, M. Sulaiman, M. Sharifi, K. Roy and P. Jamshidi. Pretrained Language Models are Symbolic Mathematics Solvers too!
This code depends on the following packages:
TorchNumPySymPyTransformersApex
-
trainer.pycontains code for fine-tuning the pre-trained language models. Please modify the following parameters for running:language: the pre-trained language.Model_Type: mbart or Marian.path1andpath2: the path of the training and the validation data.max_input_lengthandmax_output_length: 1024 for the mBART model and 512 for the Marian-MT model.model_name: name of the model you wish to save.
-
evaluator.pycontains code for evaluting the fine-tuned language model on the symbolic math data. Please modify the parameter 1-4 same as thetrainersection and also modify the following parameter:path: the path of the test dataset.saved_model: the path of the saved fine-tuned model.
-
src/hf_utils.pycontains code for reading the datasets and some utilities for evaluation.
The rest of the code is adopted from Deep learning for symbolic mathematics (Lample et al.).
The datasets are available here.
train,valid, andtestfiles contain the training, validation and test datasets for the mBART model.language_datacontains data for the training, validation and test datasets of the Marian-MT model.distribution_testcontains the test files for the distribution shift section (polynomial, trgonometric and logarithmic).
Please cite us if you use our work in your research.
@article{noorbakhsh2021pretrained,
title={Pretrained Language Models are Symbolic Mathematics Solvers too!},
author={Kimia Noorbakhsh and Modar Sulaiman and Mahdi Sharifi and Kallol Roy and Pooyan Jamshidi},
journal={arXiv preprint arXiv:2110.03501},
year={2021}
}