This framework lets you simulate how different Large Language Models (LLMs) handle public goods dilemmas and whether they'll pay the price to enforce cooperation through sanctions. Based on classic behavioral economics experiments, we explore whether LLMs prefer cooperative institutions that allow for costly norm enforcement.
- 🔄 Traditional LLMs unexpectedly outperform reasoning-focused models at cooperation
- 🏆 Some models achieve near-human cooperation levels but using different strategies
- 🎁 LLMs strongly prefer rewarding cooperation, while humans favor punishing defection
- 📊 Four distinct behavioral patterns emerge across model architectures
Understanding how LLMs cooperate can help us:
- Build better multi-agent AI systems that work together
- Explore alignment techniques for collaborative AI
- Identify which models might be better suited for cooperative tasks
- Compare AI social behaviors with human patterns
Our research reveals that current approaches to improving LLMs by enhancing reasoning capabilities doesn't necessarily improve cooperation - traditional models often cooperate better than reasoning-optimized ones.
Run a complete simulation with default parameters:
python main.py --api-provider openai --model-name gpt-4oThe codebase consists of these key components:
-
Core Simulation Files:
agent.py: LLM-based simulation participantsenvironment.py: Game environment and round progressioninstitution.py: Sanctioning and Sanction-Free institutionsparameters.py: Configurable simulation parametersmain.py: Entry point for running experiments
-
API Client Files:
azure_openai_client.py: Azure OpenAI APIopenai_client.py: OpenAI APIopenrouter_client.py: OpenRouter APIkluster_ai_client.py: KlusterAI API
- Clone this repository
- Install the required dependencies:
pip install openai pandas numpy matplotlib backoff tqdmThe simulation is configured through parameters.py:
| Parameter | Description | Default |
|---|---|---|
NUM_AGENTS |
Simulation participants | 7 |
NUM_ROUNDS |
Simulation duration | 15 |
PUBLIC_GOOD_MULTIPLIER |
Multiplication factor | 1.6 |
INITIAL_TOKENS |
Starting tokens per agent | 1000 |
ENDOWMENT_STAGE_1 |
Tokens per round | 20 |
ENDOWMENT_STAGE_2 |
Tokens for sanctioning | 20 |
PUNISHMENT_EFFECT |
Impact of punishment | -3 |
REWARD_EFFECT |
Impact of reward | +1 |
The simulation supports multiple LLM providers:
- Azure OpenAI:
python main.py --api-provider azure --deployment-name YOUR_DEPLOYMENT --azure-endpoint YOUR_ENDPOINT- OpenAI:
python main.py --api-provider openai --model-name MODEL_NAME- OpenRouter:
python main.py --api-provider openrouter --model-name MODEL_NAME- KlusterAI:
python main.py --api-provider kluster --model-name MODEL_NAMEYou can also configure API access via environment variables:
- Azure:
AZURE_API_KEY,AZURE_ENDPOINT,AZURE_DEPLOYMENT_NAME - OpenAI:
OPENAI_API_KEY,OPENAI_MODEL_NAME - OpenRouter:
OPENROUTER_API_KEY
Simulation results are saved as JSON files with the naming pattern:
simulation_results_{model_name}_{num_agents}agents_{num_rounds}rounds.json
Each file contains detailed agent decisions, including:
- Institution choices with reasoning
- Contribution amounts with reasoning
- Punishment/reward allocations with reasoning
- Payoffs and cumulative statistics
The repository includes code for analyzing agent reasoning patterns and classifying strategies.
If using this codebase for research, please cite our paper:
Try the framework with different models, contribute new analysis methods, or cite our work in your research on AI cooperation.
This project is provided for research purposes.