Priority: High
3 scenarios won't give statistically significant results. Need breadth.
Proposed Scenario Categories
| Category |
Examples |
Count |
| REST API (Python) |
CRUD, auth, pagination, websockets |
5 |
| REST API (Go) |
Chi router, middleware, gRPC gateway |
3 |
| CLI tools |
argparse, file processing, data pipelines |
3 |
| React/frontend |
Components, state management, forms |
4 |
| Full-stack |
Frontend + backend + database |
3 |
| Refactoring |
Split monolith, extract modules |
3 |
| Bug fixing |
Given broken code, fix the bug |
5 |
| Test writing |
Given code, write comprehensive tests |
3 |
Contribution Model
Community contributions via the scenario request issue template.
Each scenario needs: 2+ agents, 3+ turns, tool calls, expect blocks, at least one steering/nudge variant.
Priority: High
3 scenarios won't give statistically significant results. Need breadth.
Proposed Scenario Categories
Contribution Model
Community contributions via the scenario request issue template.
Each scenario needs: 2+ agents, 3+ turns, tool calls, expect blocks, at least one steering/nudge variant.