Skip to content

Commit 3a96d31

Browse files
committed
add sitemap, pipeline notes, and data wishlist updates
1 parent 26a5458 commit 3a96d31

3 files changed

Lines changed: 648 additions & 2 deletions

File tree

DATA_WISHLIST.md

Lines changed: 68 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,11 +120,77 @@ The paper finds agents excel at parallelization/batching and struggle with
120120
vectorization. Surfacing these tags would let users filter the explorer by
121121
strategy.
122122

123-
- **Used by:** `/explorer/` optional "Strategy" filter chip.
123+
- **Used by:** `/explorer/` optional "Strategy" filter chip; **Strategy
124+
Explorer** section on the landing page (a `ToolGrid`-style 6–8 column grid,
125+
one column per category, tasks listed inside each cell with ✓/✗ for whether
126+
the agent matched the human's strategy).
124127
- **Ideal shape:** per-task labels (`["caching", "vectorization", "io"]`) on
125-
the human PR.
128+
the human PR. Categories from the paper: `caching`, `vectorization`,
129+
`parallelization`, `batching`, `memory`, `io`, `algorithm`,
130+
`data-structure`.
131+
- **Pairs well with:** wishlist #3 (per-task patches). The Strategy Explorer
132+
becomes far more interesting if clicking a cell opens a drawer with the
133+
*representative diff hunk* for that (strategy, agent) cell — even one
134+
~10-line snippet per cell is enough to read as "this is what vectorization
135+
looks like in pandas."
126136
- **Without it:** strategy taxonomy lives only in the paper, not the site.
127137

138+
## 8. Agent family / model / cost taxonomy
139+
140+
The leaderboard currently lists agents as flat IDs (e.g.
141+
`terminus-2,gpt-5`). For an **Agent Explorer** patterned on ccunpacked.dev's
142+
slash-command catalog, we need to group them by family.
143+
144+
- **Used by:** new **Agent Explorer** section on the landing page (pill grid
145+
grouped by agent family, each pill showing the agent's signature strength
146+
and a cost-tier badge); future `/agents/` per-agent page.
147+
- **Ideal shape:** an `agents.json` keyed by agent_id:
148+
```json
149+
{
150+
"terminus-2,gpt-5": {
151+
"agent_family": "Terminus 2",
152+
"model_family": "GPT",
153+
"model": "gpt-5",
154+
"provider": "OpenAI",
155+
"cost_tier": "frontier",
156+
"open_weights": false,
157+
"signature_strength": "module-level optimization",
158+
"color_category": "frontier-closed"
159+
}
160+
}
161+
```
162+
- **Pairs well with:** wishlist #6 (per-task cost). Cost tier in the taxonomy
163+
+ per-task cost in the CSV unlocks the paper's "frontier vs. open-weights
164+
cost-effectiveness" finding as a visual.
165+
- **Without it:** agents stay as opaque IDs; we can't surface the
166+
family-level story (Terminus + frontier-LLM vs. Aider + open-weights, etc.).
167+
168+
## 9. Structured findings catalog
169+
170+
The paper has ~6 sharply phrased findings (local vs. global optimization,
171+
strategy strengths, long-tail repository performance, cost efficiency, …).
172+
`copy.json` already stores these as `{title, description}` pairs, but that's
173+
just prose. To render them as a **Findings cards grid** (the
174+
`HiddenFeatures` pattern from ccunpacked), each finding needs a category, a
175+
headline metric, and a link to where in the site that finding is *visually
176+
demonstrated*.
177+
178+
- **Used by:** new **Findings** section — tinted cards with category color,
179+
one-line description, headline metric chip, "View analysis ↗" link.
180+
- **Ideal shape:** extend `copy.json` `overview.keyFindings.findings[]`:
181+
```json
182+
{
183+
"title": "Local vs. Global Optimization",
184+
"description": "Agents are better at local or function-level …",
185+
"category": "scope",
186+
"metric": { "label": "L4 advantage", "value": -0.04 },
187+
"link": "/leaderboard/?level=L4"
188+
}
189+
```
190+
- **Without it:** findings stay as plain prose cards — readable but inert,
191+
with no visual emphasis on the actual numbers and no path from "claim" to
192+
"evidence."
193+
128194
---
129195

130196
## Out of scope (intentionally)

0 commit comments

Comments
 (0)