This document covers running parameter sweep experiments using grid configurations.
Grid experiments allow you to run multiple experiment configurations from a single config file. The system:
- Expands parameter combinations into individual configurations
- Runs each configuration sequentially
- Tracks progress in a grid manifest for crash recovery
- Supports resuming interrupted grid runs
A grid config file has a special _grid section that defines parameter sweeps:
_grid:
# Derivation rules: compute values from other fields
derive:
experiment_metadata.questions_file: "data/BBQ/{experiment_metadata.benchmark_subcategory}.jsonl"
# Sweep parameters: all combinations will be generated
sweep:
experiment_metadata.experiment_name:
- exp_variant
prompt_template_config.for_participant.choice_display_format:
- bullet
- roman_dot
- letter_colon
- none
# Broadcast: apply same value to multiple paths (linked sweep)
broadcast:
all_agent_temps:
paths:
- agent_definitions.0.temperature
- agent_definitions.1.temperature
values:
- 0.0
- 0.7
# Zip: paired values that change together (not Cartesian product)
zip:
model_definitions:
- model_definitions.llm_0.model_path: meta-llama/Llama-3.3-70B-Instruct
model_definitions.llm_0.vllm_config.tensor_parallel_size: 2
- model_definitions.llm_0.model_path: google/gemma-2-27b-it
model_definitions.llm_0.vllm_config.tensor_parallel_size: 1
# Base configuration (same as regular config)
experiment_metadata:
experiment_name: _ # Will be overridden by sweep
# ...
model_definitions:
llm_0:
model_path: _ # Will be overridden by sweep
# ...- sweep: Each parameter sweeps independently, creating a Cartesian product of all values
- broadcast: Apply the same value to multiple config paths simultaneously (linked sweep)
- zip: Group multiple parameters that must change together as paired value sets
Use broadcast when you want to apply the same value to multiple locations, e.g., setting the same temperature for all agents.
Use zip when parameters are interdependent and must change together, e.g., model path and its corresponding tensor parallel size.
The grid expander creates all combinations of sweep, broadcast, and zip parameters.
Each expanded config is a complete, valid experiment configuration.
Example: If you have:
- 2 sweep values for
choice_display_format - 2 broadcast values for
temperature - 2 zip value sets for
model_config
You get 2 x 2 x 2 = 8 configurations total.
Note: Grid configuration is the only entry point. Even single-task experiments should use a grid config with one configuration.
# Run all configurations in a grid (--grid flag is required)
python script/run_job.py config/my_grid_config.yaml --grid
# Dry run to see expanded configurations
python script/run_job.py config/my_grid_config.yaml --grid --dry-runCUDA_VISIBLE_DEVICES=0,1 \
OMP_NUM_THREADS=32 \
MAC_FAIRNESS_LIVE_STATUS=1 \
python script/run_job.py config/my_grid_config.yaml --gridEach grid configuration (task):
- Gets its own experiment directory
- Generates its own task manifest, transcripts, and task summary
- Uses a unique
job_task_idformat:{pid}_{grid_index}(e.g.,12345_0,12345_1)
Grid manifests track progress across all configurations in a grid run:
bookkeeping/grid_manifest/{timestamp}_{pid}.json
The grid config is also snapshotted at:
bookkeeping/_grid_config_snapshot/{config_name}_{timestamp}.yaml
{
"grid_config_snapshot_path": "$MAC_FAIRNESS_WORKSPACE/bookkeeping/_grid_config_snapshot/...",
"pid": 12345,
"submission_timestamp": "2025-12-13T10:00:00.000Z",
"num_tasks_planned": 8,
"num_tasks_succeeded": 2,
"tasks": [
{
"task_id": 0,
"experiment_name": "exp_variant",
"grid_sweep_specs": {
// ...
},
"status": "succeeded"
},
{
"task_id": 1,
"experiment_name": "exp_variant",
"grid_sweep_specs": {
// ...
},
"status": "started"
},
{
"task_id": 2,
"experiment_name": "exp_variant",
"grid_sweep_specs": {
// ...
},
"status": null
}
// ...
]
}Note: Each task's job_task_id is {pid}_{task_id} (e.g., 12345_0, 12345_1).
Status values:
null: Not yet started"started": Started but not completed (may have partial progress in task manifest)"succeeded": All questions completed successfully
Grid manifests and their corresponding grid config snapshots are deleted when all configurations complete successfully. If any fail, both persist for resume.
Resume a grid run when:
- The process was interrupted (Ctrl+C, timeout, crash)
- Some configurations failed and you want to retry them
- The grid manifest still exists in
bookkeeping/grid_manifest/
Resume requires using the grid config snapshot path (not the original config):
# Find your snapshot
ls bookkeeping/_grid_config_snapshot/
# Resume using the snapshot path
[ENV_VARS] python script/run_job.py bookkeeping/_grid_config_snapshot/{config_name}_{timestamp}.yaml --grid --resume
# Dry run to see tree with null questions per started task
python script/run_job.py bookkeeping/_grid_config_snapshot/{config_name}_{timestamp}.yaml --grid --resume --dry-runThis ensures the same parameter combinations are used even if you've edited the original config file.
- Load grid manifest: Finds the existing manifest for the config file
- Create new grid manifest: With pending/started tasks carried over (atomic create-then-delete)
- For started tasks:
- Finds the task manifest from the previous run
- Identifies questions with
nullstatus (not succeeded) - Resumes only those questions using the original config snapshot (no duplicates created)
- For pending tasks: Runs them fresh
- Update grid manifest: Marks tasks as succeeded on completion
Grid Resume
│
▼
┌─────────────────┐
│ Load grid │
│ manifest │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐
│ Task status? │────▶│ succeeded │──▶ Skip
└────────┬────────┘ └─────────────────┘
│
│ pending (null)
▼
┌─────────────────┐
│ Run fresh │
└─────────────────┘
│
│ started
▼
┌─────────────────┐
│ Find task │
│ manifest │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐
│ Null questions? │────▶│ None │──▶ Skip (all succeeded)
└────────┬────────┘ └─────────────────┘
│
│ Some null
▼
┌─────────────────┐
│ Resume with │
│ question_ids + │
│ original config │
│ snapshot │
└─────────────────┘
When using --dry-run --resume, started tasks show a tree with null question details:
------------------------------------------------------------
Configurations to resume:
[0] exp_vary_temp_0.5 (started)
├── Progress: 45/100 succeeded
├── Null questions: 55
│ ├── q_100
│ ├── q_101
│ ├── q_102
│ ├── q_103
│ ├── q_104
│ └── ... and 50 more
[1] exp_vary_temp_0.7 (pending)
Dry run complete. 2 configurations would be resumed.
Important: Always use the grid config snapshot path for resume, not the original config. The snapshot ensures identical parameter expansion even if you've edited the original config.
# See what configurations will run
python script/run_job.py config/my_grid_config.yaml --grid --dry-runTest with a subset before running the full grid:
_grid:
sweep:
"experiment_metadata.experiment_name":
- "test_only" # Just one value for testingGrid runs print progress for each configuration:
============================================================
Configuration 3/8 (index 2): exp_variant
============================================================
If a grid run fails, check the manifest:
ls bookkeeping/grid_manifest/
cat bookkeeping/grid_manifest/my_grid_config_*.json | python -m json.toolResume interrupted runs using the snapshot path:
# Find and use the snapshot path
ls bookkeeping/_grid_config_snapshot/
[ENV_VARS] python script/run_job.py bookkeeping/_grid_config_snapshot/{config}_{timestamp}.yaml --grid --resumeThe resume command couldn't find a grid manifest for this config. Either:
- The grid run completed successfully (manifest was deleted)
- The grid run never started
- You're using the wrong config file
All tasks in the grid manifest are marked as succeeded. The grid run is complete.
If a task was marked "started" in the grid manifest but no task manifest exists:
- The task will run all questions from scratch (warning is printed)
- This can happen if the process crashed before creating the task manifest
- The behavior is correct - without a manifest, we can't know which questions succeeded