Skip to content

davidguzmanp/Graph-to-Text-LLM-with-dataset-augmentation

Repository files navigation

Enhancing Graph-to-Text Systems in Low-Resource Settings: Distilling Chain-of-Thought Reasoning for Task-Specific Worflows

Here we present the code that can be used to replicate our experiments for our final report, Enhancing Graph-to-Text Systems in Low-Resource Settings: Distilling Chain-of-Thought Reasoning for Task-Specific Worflows. All the preprocessing and fine-tuning steps can be found in the notebooks given in the following folders:

  1. PLMs finetuned for data to text: We pre-train and fine-tune a FLAN-T5-small model on the 2017 version of WebNLG dataset for the graph-to-text task. We evaluate the performance using the BLEU, NLTK BLEU and chrF++ metrics. This is a reproduction of the paper: "Investigating Pretrained Language Models for Graph-to-Text Generation", EMNLP | NLP4ConvAI. 2021a).

  2. Adapter, PLM with GNN for data to text: Our attempt to reproduce the GNN adapter model from the paper: "Structural Adapters in Pretrained Language Models for AMR-to-Text Generation"

  3. Knowledge tansfer, PLM finetuned on different data to text tasks: In an effort to harness the potential of multi-task learning, we incorporated supplementary tasks that bear a similar fundamental structure to our primary graph-to-text task. We train FLAN-T5-base models on (1) WebNLG 2020, (2) DART, (3) A mixed dataset of WebNLG 2020, DART and E2E. Then we evaluate the performances of the three datasets separately.

  4. Dataset augmentation via prompt engineering: Inspired from the paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" and knowledge distillation techniques ("Textbooks Are All You Need"), we built a pipelined model to infuse chain-of-thought reasoning into T5 small and base models on the WebNLG 2020 dataset for graph-to-text generation.

Datasets

In our experiments, we use the following datasets: WebNLG 2017, WebNLG 2020, DART, E2E

Metrics

For our experiments, we use the following metrics: BLEU, NLTK BLEU, chrF++ and BERT score.

  • You can also find our notebooks with the cell outputs, training multiple models on multiple outputs in our legacy code folder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors