Phase P5 — the headline result.
- Scenarios gain
orchestration field. Harness runs same scenario under each shape.
- Shapes:
flat | waves | scatter-gather | team-mode | subagents (canonical in Axiom §17.1)
- New flag:
pawbench --orchestration flat,waves,scatter-gather
pawbench-compare gains --by orchestration pivot
- New scenario:
bench/scenarios/pawstyle-orchestration-matrix.json with ≥ 4 independent feature blocks
- New SLI
orchestration_dqs_spread = max(DQS) − min(DQS) across shapes on the same task — the metric that lets us re-derive Fabian's finding from our own data
Gate to P6: 5-shape matrix lands in leaderboard with at least one CI run exercising all five shapes E2E.
Open question: team-mode vs subagents need precise operational contracts before this lands. Likely maps to waves+shared-scratchpad and scatter-gather-without-merge respectively. Design doc required.
Inspired by Fabian Wesner's One-Shot Shop Challenge — Team Mode 85% vs Sub-Agents 57% on the same model.
Phase P5 — the headline result.
orchestrationfield. Harness runs same scenario under each shape.flat|waves|scatter-gather|team-mode|subagents(canonical in Axiom §17.1)pawbench --orchestration flat,waves,scatter-gatherpawbench-comparegains--by orchestrationpivotbench/scenarios/pawstyle-orchestration-matrix.jsonwith ≥ 4 independent feature blocksorchestration_dqs_spread = max(DQS) − min(DQS)across shapes on the same task — the metric that lets us re-derive Fabian's finding from our own dataGate to P6: 5-shape matrix lands in leaderboard with at least one CI run exercising all five shapes E2E.
Open question:
team-modevssubagentsneed precise operational contracts before this lands. Likely maps to waves+shared-scratchpad and scatter-gather-without-merge respectively. Design doc required.Inspired by Fabian Wesner's One-Shot Shop Challenge — Team Mode 85% vs Sub-Agents 57% on the same model.