Benchmark mode: measure context compression vs task success tradeoff

## Background

Building a 5-tier context compiler for agent dispatch revealed that context quality dominates context quantity — 205-word summaries outperform 24K-word full trajectories (SWE Context Bench, Oxford 2026). But nobody benchmarks the tradeoff systematically.

## Proposed Addition

Add a benchmark mode that measures how context compression affects task completion:

### Configurations to test
1. **No compression** — full file contents, all rules, no filtering
2. **Manifest only** — file paths + line counts, no content
3. **Tiered** — submodular tier assignment (full/skeleton/summary/manifest)
4. **Minimal** — just the task description, no file context

### Metrics per configuration
- Task completion rate (ok/partial/fail)
- Token usage (input + output)
- Cost
- Wall-clock time
- Turns to completion

### Implementation
```bash
pawbench --context-mode none       # baseline: no context injection
pawbench --context-mode full       # all files inline
pawbench --context-mode manifest   # file manifest only
pawbench --context-mode tiered     # submodular tier assignment
```

Each mode runs all scenarios and produces a comparison table showing the quality-cost tradeoff.

## Why this matters

No published benchmark measures context composition strategy vs task success. This would be the first, and would validate (or invalidate) techniques like manifest-first, skeleton compression, and selective retrieval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark mode: measure context compression vs task success tradeoff #8

Background

Proposed Addition

Configurations to test

Metrics per configuration

Implementation

Why this matters

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmark mode: measure context compression vs task success tradeoff #8

Description

Background

Proposed Addition

Configurations to test

Metrics per configuration

Implementation

Why this matters

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions