Enhancing Graph-to-Text Systems in Low-Resource Settings: Distilling Chain-of-Thought Reasoning for Task-Specific Worflows

Here we present the code that can be used to replicate our experiments for our final report, Enhancing Graph-to-Text Systems in Low-Resource Settings: Distilling Chain-of-Thought Reasoning for Task-Specific Worflows. All the preprocessing and fine-tuning steps can be found in the notebooks given in the following folders:

PLMs finetuned for data to text: We pre-train and fine-tune a FLAN-T5-small model on the 2017 version of WebNLG dataset for the graph-to-text task. We evaluate the performance using the BLEU, NLTK BLEU and chrF++ metrics. This is a reproduction of the paper: "Investigating Pretrained Language Models for Graph-to-Text Generation", EMNLP | NLP4ConvAI. 2021a).
Adapter, PLM with GNN for data to text: Our attempt to reproduce the GNN adapter model from the paper: "Structural Adapters in Pretrained Language Models for AMR-to-Text Generation"
Knowledge tansfer, PLM finetuned on different data to text tasks: In an effort to harness the potential of multi-task learning, we incorporated supplementary tasks that bear a similar fundamental structure to our primary graph-to-text task. We train FLAN-T5-base models on (1) WebNLG 2020, (2) DART, (3) A mixed dataset of WebNLG 2020, DART and E2E. Then we evaluate the performances of the three datasets separately.
Dataset augmentation via prompt engineering: Inspired from the paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" and knowledge distillation techniques ("Textbooks Are All You Need"), we built a pipelined model to infuse chain-of-thought reasoning into T5 small and base models on the WebNLG 2020 dataset for graph-to-text generation.

Datasets

In our experiments, we use the following datasets: WebNLG 2017, WebNLG 2020, DART, E2E

For our experiments, we use the following metrics: BLEU, NLTK BLEU, chrF++ and BERT score.

You can also find our notebooks with the cell outputs, training multiple models on multiple outputs in our legacy code folder

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
1. PLMs finetuned for data to text		1. PLMs finetuned for data to text
2. Adapter, PLM with GNN for data to text		2. Adapter, PLM with GNN for data to text
3. Knowledge tansfer, PLM finetuned on different data to text tasks		3. Knowledge tansfer, PLM finetuned on different data to text tasks
4. Dataset augmentation via prompt engineering		4. Dataset augmentation via prompt engineering
legacy		legacy
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
CS4NLP_Final_Report.pdf		CS4NLP_Final_Report.pdf
README.md		README.md