A comprehensive framework for multi-turn reinforcement learning training of language model agents in gaming environments.
- Python 3.10
- CUDA-compatible GPU (A100, L40, or similar)
- Conda package manager
-
Create conda environment:
conda create --name lmgame_train python=3.10 conda activate lmgame_train
-
Set up authentication (optional but recommended):
export WANDB_API_KEY=your_wandb_api_key export WANDB_ENTITY=your_wandb_entity export HF_TOKEN=your_huggingface_token
-
Run setup script:
./scripts/setup.sh
source train_sokoban.shThe framework is pre-configured for different GPU setups:
| GPU Type | Agent Groups | Group Size | Total Agents | Default Model |
|---|---|---|---|---|
| A100 (default) | 8 | 16 | 128 | Qwen/Qwen2.5-0.5B-Instruct |
| L40 | 4 | 2 | 8 | Qwen/Qwen2.5-0.5B-Instruct |
Note: The A100 configuration is the default setting in
configs/base.yaml. For other GPUs, adjustagent_group_numandagent_group_sizein the config file.
- System Design Overview - Architecture and design principles
- Development Guide - Contributing and development workflow
This project is licensed under the MIT License - see the LICENSE file for details.