Skip to content

Commit 0b05ad4

Browse files
authored
feat: add model escalation (sonnet→opus) for task-group-implementer
1 parent 752cfb6 commit 0b05ad4

4 files changed

Lines changed: 159 additions & 25 deletions

File tree

plugins/maister/CLAUDE.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -455,6 +455,7 @@ Skills are automatically invoked by Claude when appropriate. Details live in eac
455455
|-------|---------|---------|
456456
| `codebase-analyzer` | Thin dispatcher: selects agent roles adaptively, launches parallel Explore subagents, delegates report synthesis to `codebase-analysis-reporter` subagent | `skills/codebase-analyzer/SKILL.md` |
457457
| `implementer` | Executes plans with **mandatory** standards reading (INDEX.md + implementation-plan.md Standards Compliance section + keyword-triggered) and **test step enforcement** (requires user approval to skip N.1 tests) | `skills/implementer/SKILL.md` |
458+
| `implementation-plan-executor` | Executes implementation plans with two-mode adaptive execution. Mode A (≤5 steps): direct. Mode B (6+ steps): delegates to `task-group-implementer` subagent with **model escalation** (sonnet → opus on BLOCKED) | `skills/implementation-plan-executor/SKILL.md` |
458459
| `implementation-verifier` | Read-only QA orchestrator: delegates completeness checks, test execution, code review, and production readiness to specialized subagents; compiles results into verification report | `skills/implementation-verifier/SKILL.md` |
459460
| `standards-discover` | Parallel multi-source standards discovery (config, code, docs, PRs/CI) with confidence scoring | `skills/standards-discover/SKILL.md` |
460461
| `docs-manager` | Internal engine for doc file operations, INDEX.md generation, CLAUDE.md integration. Not user-invocable — accessed via `docs-operator` agent (Task tool) by init, standards-update, standards-discover | `skills/docs-manager/skill.md` |
@@ -601,6 +602,7 @@ Subagents are specialized AI agents invoked by skills and orchestrators. All age
601602
| `spec-auditor` | Independent spec audit with senior auditor perspective | orchestrators | `agents/spec-auditor.md` |
602603
| `reality-assessor` | Validates work actually solves the problem | implementation-verifier | `agents/reality-assessor.md` |
603604
| `implementation-changes-planner` | Creates detailed change plans (no file modifications) | implementer | `agents/implementation-changes-planner.md` |
605+
| `task-group-implementer` | Executes a single task group: writes code, runs tests, reports status. Supports model escalation (sonnet → opus on BLOCKED). | implementation-plan-executor | `agents/task-group-implementer.md` |
604606

605607
**See**: Individual `agents/*.md` files for detailed workflows and philosophies.
606608

@@ -614,6 +616,7 @@ Subagents are specialized AI agents invoked by skills and orchestrators. All age
614616
6. **Incremental Verification**: Run only new tests after each group, not entire suite
615617
7. **Comprehensive Verification Before Commit**: Run full test suite and create verification report before code review
616618
8. **Task Directory Artifact Anchoring**: ALL workflow artifacts (reports, documentation, screenshots) MUST be saved under the task directory (`.maister/tasks/[type]/[task-name]/`). NEVER save task artifacts to project directories like `docs/`, `src/`, or project root.
619+
9. **Model Escalation**: Subagents start on sonnet; if BLOCKED, automatically retry with opus before asking the user
617620

618621
**For detailed workflow documentation, see**: individual skill `SKILL.md` files
619622

plugins/maister/agents/task-group-implementer.md

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
name: task-group-implementer
33
description: Execute a single task group from an implementation plan with continuous standards discovery. Writes code, runs tests, returns structured execution report. Does NOT mark checkboxes - main agent handles progress tracking.
4-
model: inherit
4+
model: sonnet
55
color: green
66
---
77

@@ -25,6 +25,24 @@ Execute one task group from an implementation plan: write tests, implement code,
2525
4. **Structured reporting**: Return results in expected format for main agent
2626
5. **No progress tracking**: Do NOT mark checkboxes - main agent owns that responsibility
2727

28+
## When You're Stuck
29+
30+
It is always OK to stop and report that you can't complete the task. Bad work is worse than no work. You will not be penalized for escalating.
31+
32+
**Report BLOCKED when:**
33+
- The task requires architectural decisions with multiple valid approaches
34+
- You need to understand code beyond what was provided and can't find clarity
35+
- You feel uncertain about whether your approach is correct
36+
- The task involves restructuring existing code in ways the plan didn't anticipate
37+
- You've been reading file after file trying to understand the system without progress
38+
39+
**Report NEEDS_CONTEXT when:**
40+
- You need information about a specific file, function, or pattern not provided
41+
- The spec is ambiguous about a specific requirement
42+
- You need to know which of two approaches the project prefers
43+
44+
**How to report:** Set your status to BLOCKED or NEEDS_CONTEXT. Describe specifically what you're stuck on, what you've tried, and what kind of help you need. The coordinator can provide more context, re-dispatch with a more capable model, or break the task into smaller pieces.
45+
2846
## Decision-Making Framework
2947

3048
When facing implementation choices:
@@ -139,7 +157,7 @@ Output structured report in expected format (see Output Format section).
139157
```markdown
140158
## Group [N] Execution Report
141159

142-
### Status: [SUCCESS/PARTIAL/FAILED]
160+
### Status: [SUCCESS/SUCCESS_WITH_CONCERNS/PARTIAL/NEEDS_CONTEXT/BLOCKED]
143161

144162
### Steps Completed
145163
- [x] N.1 - [brief description]
@@ -216,15 +234,21 @@ If you encounter errors during implementation:
216234
1. **Syntax/compile errors**: Fix before proceeding
217235
2. **Missing dependencies**: Note in report, attempt reasonable fix
218236
3. **Unclear requirements**: Make reasonable choice, document in notes
219-
4. **Blocking issues**: Report FAILED status with details
237+
4. **Blocking issues**: Report BLOCKED status with details
220238

221239
### What Triggers Each Status
222240

223241
| Status | When to Use |
224242
|--------|-------------|
225243
| **SUCCESS** | All steps complete, all tests pass |
226-
| **PARTIAL** | Some steps complete, tests failing, or minor issues |
227-
| **FAILED** | Blocking issue prevents completion, needs main agent intervention |
244+
| **SUCCESS_WITH_CONCERNS** | All steps complete, but flagging doubts (e.g., file growing too large, uncertain edge case) |
245+
| **PARTIAL** | Some steps complete, tests failing, or minor issues — you made progress but couldn't finish |
246+
| **NEEDS_CONTEXT** | Missing information that wasn't provided. You know what you need — specify it precisely |
247+
| **BLOCKED** | Cannot complete due to complexity, unclear architecture, or conflicting requirements. Describe what you're stuck on and what you've tried |
248+
249+
**BLOCKED vs PARTIAL:** Use BLOCKED when the problem is reasoning/understanding (you don't know HOW), not execution (you know how but hit errors). BLOCKED triggers model escalation; PARTIAL triggers main agent investigation.
250+
251+
**NEEDS_CONTEXT vs BLOCKED:** Use NEEDS_CONTEXT when you can name the specific missing information. Use BLOCKED when you can't articulate a specific ask — you're stuck.
228252

229253
## Integration
230254

@@ -279,4 +303,4 @@ During step N.3, realize auth pattern needed → Check INDEX.md → Find and rea
279303

280304
### Scenario 4: Blocking Issue
281305

282-
Can't proceed due to missing dependency or unclear spec → Report FAILED with clear explanation → Main agent will use AskUserQuestion to decide path forward
306+
Can't proceed due to missing dependency or unclear spec → Report BLOCKED with clear explanation → Main agent will escalate model or use AskUserQuestion to decide path forward

plugins/maister/skills/implementation-plan-executor/SKILL.md

Lines changed: 69 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -131,12 +131,42 @@ For each task group:
131131

132132
5. Use `TaskUpdate` to set the group task to `status: "completed"` with `metadata: {completed_at, tests_passed, files_modified, standards_applied}`
133133

134-
6. **If subagent reports failure**:
135-
- Do NOT auto-rollback (see Critical Principle in CLAUDE.md)
136-
- Assess: config issue? test setup? logic error?
137-
- Use AskUserQuestion for recovery path
134+
6. **Process subagent status**:
135+
136+
**SUCCESS / SUCCESS_WITH_CONCERNS**: Proceed normally. If concerns flagged, log them in work-log.
137+
138+
**PARTIAL**: Subagent made progress but couldn't finish. Assess root cause:
139+
- Test failures → analyze, apply fix if obvious, re-run
140+
- If unclear → AskUserQuestion with recovery options
138141
- Keep group task as `in_progress` with `metadata: {failed_at, failure_reason}`
139142

143+
**NEEDS_CONTEXT**: Subagent needs specific information. Read what they're asking for, provide it, and re-dispatch with the **same model** (sonnet):
144+
- Extract the specific ask from subagent output
145+
- Gather the requested context (read files, check standards, etc.)
146+
- Re-dispatch task-group-implementer with original prompt + additional context section
147+
- No model change — the problem is missing data, not reasoning
148+
149+
**BLOCKED**: Subagent is stuck on complexity/reasoning. **Escalate model**:
150+
- Re-dispatch task-group-implementer with `model: opus` parameter
151+
- Include the original prompt + subagent's BLOCKED explanation as additional context
152+
- If opus also returns BLOCKED → stop and use AskUserQuestion:
153+
```
154+
Question: "Task group [N] blocked even with escalated model. [Brief reason from subagent]. How to proceed?"
155+
Header: "Model Escalation Failed"
156+
Options:
157+
- "Break into smaller pieces" - Split this group and retry
158+
- "Provide more context" - I'll give additional information
159+
- "Skip this group" - Mark as skipped, continue
160+
- "Stop implementation" - Pause for investigation
161+
```
162+
- Log escalation in work-log: "Group N: escalated sonnet → opus. Reason: [from BLOCKED status]"
163+
164+
**Key rules:**
165+
- Never retry the same model without changes
166+
- NEEDS_CONTEXT → same model (missing data)
167+
- BLOCKED → opus (reasoning/complexity)
168+
- Opus BLOCKED → always ask user
169+
140170
## Continuous Standards Discovery
141171
142172
**Philosophy**: Standards are discovered when relevant, not memorized upfront.
@@ -237,14 +267,42 @@ You have access to `.maister/docs/INDEX.md` for continuous standards discovery.
237267
[See Subagent Output Format section]
238268
```
239269

270+
### Re-dispatch on BLOCKED (Model Escalation)
271+
272+
When re-dispatching with opus after BLOCKED:
273+
274+
````markdown
275+
## Task: Execute Task Group [N] (Escalated)
276+
277+
**Previous attempt status**: BLOCKED
278+
**Previous attempt explanation**: [paste BLOCKED explanation from subagent]
279+
**Model**: opus (escalated from sonnet)
280+
281+
### Task Group Content
282+
[Same as original dispatch]
283+
284+
### Specification Excerpt
285+
[Same as original dispatch]
286+
287+
### Standards
288+
[Same as original dispatch]
289+
290+
### Additional Context
291+
[Any context gathered based on the BLOCKED explanation]
292+
293+
### Requirements
294+
[Same as original dispatch, plus:]
295+
5. You are running on a more capable model because the previous attempt was blocked. Use your additional reasoning capability to work through the complexity described above.
296+
````
297+
240298
## Subagent Output Format
241299

242300
The task-group-implementer returns structured output:
243301

244302
```markdown
245303
## Group [N] Execution Report
246304

247-
### Status: [SUCCESS/PARTIAL/FAILED]
305+
### Status: [SUCCESS/SUCCESS_WITH_CONCERNS/PARTIAL/NEEDS_CONTEXT/BLOCKED]
248306

249307
### Steps Completed
250308
- [x] N.1 - [description]
@@ -355,22 +413,14 @@ After each task group:
355413

356414
### Subagent Failure (Mode B)
357415

358-
If task-group-implementer reports failure:
416+
Subagent status handling is defined in Mode B step 6 above. Additional rules:
359417

360418
1. **Do NOT auto-rollback** - User-confirmed rollback only
361-
2. **Analyze root cause** from subagent output
362-
3. **Check for easy fixes**: config issues, missing dependencies, test setup
363-
4. **Use AskUserQuestion**:
364-
```
365-
Question: "Group [N] implementation failed: [brief reason]. How to proceed?"
366-
Header: "Failure"
367-
Options:
368-
- "Try suggested fix" - [if easy fix identified]
369-
- "Retry group" - Re-invoke subagent
370-
- "Complete manually" - Switch to direct execution for this group
371-
- "Rollback changes" - Revert this group's changes
372-
- "Stop" - Pause for investigation
373-
```
419+
2. **Model escalation is automatic** - BLOCKED → opus happens without asking user
420+
3. **User involvement triggers**:
421+
- Opus returns BLOCKED (end of escalation chain)
422+
- PARTIAL status with unclear root cause
423+
- Max 1 NEEDS_CONTEXT re-dispatch per group (if still NEEDS_CONTEXT after providing context → AskUserQuestion)
374424

375425
### Test Failure
376426

plugins/maister/skills/orchestrator-framework/references/orchestrator-patterns.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -324,3 +324,60 @@ If prerequisites missing, use AskUserQuestion: "Start from Phase 1", "Specify di
324324
| User chooses "Proceed with known issues" | Proceed with warning logged |
325325
| Max iterations (3) reached | Ask user how to proceed |
326326
| Critical issues remain unresolved | **MUST NOT proceed** — require user approval first |
327+
328+
---
329+
330+
## 7. Model Escalation Pattern
331+
332+
When a subagent reports BLOCKED status, the coordinator can re-dispatch with a more capable model. This is an automatic escalation — no user confirmation needed for the first tier.
333+
334+
### Escalation Chain
335+
336+
````
337+
sonnet (default) → BLOCKED → opus → BLOCKED → AskUserQuestion
338+
````
339+
340+
### Status-to-Action Mapping
341+
342+
| Subagent Status | Action | Model Change |
343+
|----------------|--------|--------------|
344+
| SUCCESS / SUCCESS_WITH_CONCERNS | Proceed | None |
345+
| PARTIAL | Investigate, fix if obvious, ask user if unclear | None |
346+
| NEEDS_CONTEXT | Provide requested context, re-dispatch | Same model |
347+
| BLOCKED | Re-dispatch with more capable model | sonnet → opus |
348+
349+
### Key Rules
350+
351+
1. **Never retry same model without changes** — if BLOCKED, something must change (model, context, or task scope)
352+
2. **NEEDS_CONTEXT ≠ BLOCKED** — missing data → same model; reasoning limit → higher model
353+
3. **End of chain → user** — when the most capable model is BLOCKED, always AskUserQuestion
354+
4. **Log escalations** — record in work-log for visibility and cost tracking
355+
5. **No automatic rollback** — BLOCKED does not mean "undo what was done"
356+
357+
### When to Apply
358+
359+
This pattern applies to any agent that:
360+
- Has `model: sonnet` in frontmatter (not `inherit` or `opus`)
361+
- Implements the enriched status protocol (SUCCESS/SUCCESS_WITH_CONCERNS/PARTIAL/NEEDS_CONTEXT/BLOCKED)
362+
- Is dispatched by a coordinator skill that processes the output
363+
364+
Currently applies to:
365+
- `task-group-implementer` (dispatched by `implementation-plan-executor`)
366+
367+
### Re-dispatch Prompt Structure
368+
369+
When escalating, the coordinator includes:
370+
- Original task prompt (unchanged)
371+
- Previous attempt's BLOCKED explanation
372+
- Any additional context gathered
373+
- Note that this is an escalated dispatch with a more capable model
374+
375+
### Anti-Patterns
376+
377+
| Anti-Pattern | Why It's Wrong |
378+
|--------------|----------------|
379+
| Retrying same model on BLOCKED | Wastes tokens, same result |
380+
| Escalating on NEEDS_CONTEXT | Problem is data, not reasoning — provide context first |
381+
| Escalating on PARTIAL | Subagent made progress — investigate the specific failure |
382+
| Skipping user when opus is BLOCKED | End of chain, user must decide next step |
383+
| Auto-rollback on BLOCKED | BLOCKED means "stuck", not "failed" — work may be partially valid |

0 commit comments

Comments
 (0)