Codebase for ProactiveAI in Conversations — an approach combining LLM priors with Q-adapters for task-oriented dialogue planning.
This repo explains the following parts:
- Downloading LLM Weights
- How the model is trained
- How the model flows based on the architecture
- Extra information
The architecture diagram consists of the following main components:
- Policy Planner:
- Self-Play:
- Critic LLM:
- Replay Buffer:
Reinforcement learning is done based on the replay buffer
Download the LLM model weights locally (it's easier because its faster to load!)
Steps:
- Adjust the model name: https://github.com/declare-lab/dialogxpert/blob/master/download_llm_weights.py#L4-5
python download_llm_weights.py
NOTE:
-
You will need to change the
repo_idindownload_llm_weights.pyto change the LLM weights to download. -
Please ensure that you are logged into huggingface and have the necessary tokens enabled.
Before you train the model:
- Decide the dataset to use
- Make the changes to the dataset arg (
get_args_train-> --data_name parameter) - Make changes to the necessary functions in the code in
env.py:- LLM Policy Prompt: Replace with {dataset_name}_prompt (choose from
qwen_prompts.py) - Roleplay functions: Replace with {dataset_name}_roleplay (choose from
qwen_prompts.py)
- LLM Policy Prompt: Replace with {dataset_name}_prompt (choose from
After you are set, run:
python train_model.py
Training starts: https://github.com/declare-lab/dialogxpert/blob/master/train_model.py#L165
Episode loading: https://github.com/declare-lab/dialogxpert/blob/master/train_model.py#L170
Action selection: https://github.com/declare-lab/dialogxpert/blob/master/train_model.py#L178
Self-play (System): https://github.com/declare-lab/dialogxpert/blob/master/env.py#L417
Self-play (User): https://github.com/declare-lab/dialogxpert/blob/master/env.py#L435
Critic LLM: https://github.com/declare-lab/dialogxpert/blob/master/env.py#L443
Replay Buffer: https://github.com/declare-lab/dialogxpert/blob/master/train_model.py#L228
Status Check: https://github.com/declare-lab/dialogxpert/blob/master/train_model.py#L243
Training the network: https://github.com/declare-lab/dialogxpert/blob/master/train_model.py#L255
Adjustments: https://github.com/declare-lab/dialogxpert/blob/master/llm_priors.py#L87
Prompts: https://github.com/declare-lab/dialogxpert/blob/master/qwen_prompts.py
Testing: https://github.com/declare-lab/dialogxpert/blob/master/train_model.py#L24
The following repositories are given credit for their open-source code utilization
- PPDPP: https://github.com/dengyang17/PPDPP/tree/main
- DPDP: https://github.com/cs-holder/DPDP/tree/main
- RL-LLM: https://github.com/yanxue7/RL-LLM-Prior/tree/main
