feat: add model escalation (sonnet→opus) for task-group-implementer

mariuszs · web-flow · commit 0b05ad48f5da · 2026-03-26T11:56:50.000+01:00
diff --git a/plugins/maister/CLAUDE.md b/plugins/maister/CLAUDE.md
@@ -455,6 +455,7 @@ Skills are automatically invoked by Claude when appropriate. Details live in eac
 |-------|---------|---------|
 | `codebase-analyzer` | Thin dispatcher: selects agent roles adaptively, launches parallel Explore subagents, delegates report synthesis to `codebase-analysis-reporter` subagent | `skills/codebase-analyzer/SKILL.md` |
 | `implementer` | Executes plans with **mandatory** standards reading (INDEX.md + implementation-plan.md Standards Compliance section + keyword-triggered) and **test step enforcement** (requires user approval to skip N.1 tests) | `skills/implementer/SKILL.md` |
+| `implementation-plan-executor` | Executes implementation plans with two-mode adaptive execution. Mode A (≤5 steps): direct. Mode B (6+ steps): delegates to `task-group-implementer` subagent with **model escalation** (sonnet → opus on BLOCKED) | `skills/implementation-plan-executor/SKILL.md` |
 | `implementation-verifier` | Read-only QA orchestrator: delegates completeness checks, test execution, code review, and production readiness to specialized subagents; compiles results into verification report | `skills/implementation-verifier/SKILL.md` |
 | `standards-discover` | Parallel multi-source standards discovery (config, code, docs, PRs/CI) with confidence scoring | `skills/standards-discover/SKILL.md` |
 | `docs-manager` | Internal engine for doc file operations, INDEX.md generation, CLAUDE.md integration. Not user-invocable — accessed via `docs-operator` agent (Task tool) by init, standards-update, standards-discover | `skills/docs-manager/skill.md` |
@@ -601,6 +602,7 @@ Subagents are specialized AI agents invoked by skills and orchestrators. All age
 | `spec-auditor` | Independent spec audit with senior auditor perspective | orchestrators | `agents/spec-auditor.md` |
 | `reality-assessor` | Validates work actually solves the problem | implementation-verifier | `agents/reality-assessor.md` |
 | `implementation-changes-planner` | Creates detailed change plans (no file modifications) | implementer | `agents/implementation-changes-planner.md` |
+| `task-group-implementer` | Executes a single task group: writes code, runs tests, reports status. Supports model escalation (sonnet → opus on BLOCKED). | implementation-plan-executor | `agents/task-group-implementer.md` |
 
 **See**: Individual `agents/*.md` files for detailed workflows and philosophies.
 
@@ -614,6 +616,7 @@ Subagents are specialized AI agents invoked by skills and orchestrators. All age
 6. **Incremental Verification**: Run only new tests after each group, not entire suite
 7. **Comprehensive Verification Before Commit**: Run full test suite and create verification report before code review
 8. **Task Directory Artifact Anchoring**: ALL workflow artifacts (reports, documentation, screenshots) MUST be saved under the task directory (`.maister/tasks/[type]/[task-name]/`). NEVER save task artifacts to project directories like `docs/`, `src/`, or project root.
+9. **Model Escalation**: Subagents start on sonnet; if BLOCKED, automatically retry with opus before asking the user
 
 **For detailed workflow documentation, see**: individual skill `SKILL.md` files
 
diff --git a/plugins/maister/agents/task-group-implementer.md b/plugins/maister/agents/task-group-implementer.md
@@ -1,7 +1,7 @@
 ---
 name: task-group-implementer
 description: Execute a single task group from an implementation plan with continuous standards discovery. Writes code, runs tests, returns structured execution report. Does NOT mark checkboxes - main agent handles progress tracking.
-model: inherit
+model: sonnet
 color: green
 ---
 
@@ -25,6 +25,24 @@ Execute one task group from an implementation plan: write tests, implement code,
 4. **Structured reporting**: Return results in expected format for main agent
 5. **No progress tracking**: Do NOT mark checkboxes - main agent owns that responsibility
 
+## When You're Stuck
+
+It is always OK to stop and report that you can't complete the task. Bad work is worse than no work. You will not be penalized for escalating.
+
+**Report BLOCKED when:**
+- The task requires architectural decisions with multiple valid approaches
+- You need to understand code beyond what was provided and can't find clarity
+- You feel uncertain about whether your approach is correct
+- The task involves restructuring existing code in ways the plan didn't anticipate
+- You've been reading file after file trying to understand the system without progress
+
+**Report NEEDS_CONTEXT when:**
+- You need information about a specific file, function, or pattern not provided
+- The spec is ambiguous about a specific requirement
+- You need to know which of two approaches the project prefers
+
+**How to report:** Set your status to BLOCKED or NEEDS_CONTEXT. Describe specifically what you're stuck on, what you've tried, and what kind of help you need. The coordinator can provide more context, re-dispatch with a more capable model, or break the task into smaller pieces.
+
 ## Decision-Making Framework
 
 When facing implementation choices:
@@ -139,7 +157,7 @@ Output structured report in expected format (see Output Format section).
 ```markdown
 ## Group [N] Execution Report
 
-### Status: [SUCCESS/PARTIAL/FAILED]
+### Status: [SUCCESS/SUCCESS_WITH_CONCERNS/PARTIAL/NEEDS_CONTEXT/BLOCKED]
 
 ### Steps Completed
 - [x] N.1 - [brief description]
@@ -216,15 +234,21 @@ If you encounter errors during implementation:
 1. **Syntax/compile errors**: Fix before proceeding
 2. **Missing dependencies**: Note in report, attempt reasonable fix
 3. **Unclear requirements**: Make reasonable choice, document in notes
-4. **Blocking issues**: Report FAILED status with details
+4. **Blocking issues**: Report BLOCKED status with details
 
 ### What Triggers Each Status
 
 | Status | When to Use |
 |--------|-------------|
 | **SUCCESS** | All steps complete, all tests pass |
-| **PARTIAL** | Some steps complete, tests failing, or minor issues |
-| **FAILED** | Blocking issue prevents completion, needs main agent intervention |
+| **SUCCESS_WITH_CONCERNS** | All steps complete, but flagging doubts (e.g., file growing too large, uncertain edge case) |
+| **PARTIAL** | Some steps complete, tests failing, or minor issues — you made progress but couldn't finish |
+| **NEEDS_CONTEXT** | Missing information that wasn't provided. You know what you need — specify it precisely |
+| **BLOCKED** | Cannot complete due to complexity, unclear architecture, or conflicting requirements. Describe what you're stuck on and what you've tried |
+
+**BLOCKED vs PARTIAL:** Use BLOCKED when the problem is reasoning/understanding (you don't know HOW), not execution (you know how but hit errors). BLOCKED triggers model escalation; PARTIAL triggers main agent investigation.
+
+**NEEDS_CONTEXT vs BLOCKED:** Use NEEDS_CONTEXT when you can name the specific missing information. Use BLOCKED when you can't articulate a specific ask — you're stuck.
 
 ## Integration
 
@@ -279,4 +303,4 @@ During step N.3, realize auth pattern needed → Check INDEX.md → Find and rea
 
 ### Scenario 4: Blocking Issue
 
-Can't proceed due to missing dependency or unclear spec → Report FAILED with clear explanation → Main agent will use AskUserQuestion to decide path forward
+Can't proceed due to missing dependency or unclear spec → Report BLOCKED with clear explanation → Main agent will escalate model or use AskUserQuestion to decide path forward
diff --git a/plugins/maister/skills/implementation-plan-executor/SKILL.md b/plugins/maister/skills/implementation-plan-executor/SKILL.md
@@ -131,12 +131,42 @@ For each task group:
 
 5. Use `TaskUpdate` to set the group task to `status: "completed"` with `metadata: {completed_at, tests_passed, files_modified, standards_applied}`
 
-6. **If subagent reports failure**:
-   - Do NOT auto-rollback (see Critical Principle in CLAUDE.md)
-   - Assess: config issue? test setup? logic error?
-   - Use AskUserQuestion for recovery path
+6. **Process subagent status**:
+
+   **SUCCESS / SUCCESS_WITH_CONCERNS**: Proceed normally. If concerns flagged, log them in work-log.
+
+   **PARTIAL**: Subagent made progress but couldn't finish. Assess root cause:
+   - Test failures → analyze, apply fix if obvious, re-run
+   - If unclear → AskUserQuestion with recovery options
    - Keep group task as `in_progress` with `metadata: {failed_at, failure_reason}`
 
+   **NEEDS_CONTEXT**: Subagent needs specific information. Read what they're asking for, provide it, and re-dispatch with the **same model** (sonnet):
+   - Extract the specific ask from subagent output
+   - Gather the requested context (read files, check standards, etc.)
+   - Re-dispatch task-group-implementer with original prompt + additional context section
+   - No model change — the problem is missing data, not reasoning
+
+   **BLOCKED**: Subagent is stuck on complexity/reasoning. **Escalate model**:
+   - Re-dispatch task-group-implementer with `model: opus` parameter
+   - Include the original prompt + subagent's BLOCKED explanation as additional context
+   - If opus also returns BLOCKED → stop and use AskUserQuestion:
+     ```
+     Question: "Task group [N] blocked even with escalated model. [Brief reason from subagent]. How to proceed?"
+     Header: "Model Escalation Failed"
+     Options:
+     - "Break into smaller pieces" - Split this group and retry
+     - "Provide more context" - I'll give additional information
+     - "Skip this group" - Mark as skipped, continue
+     - "Stop implementation" - Pause for investigation
+     ```
+   - Log escalation in work-log: "Group N: escalated sonnet → opus. Reason: [from BLOCKED status]"
+
+   **Key rules:**
+   - Never retry the same model without changes
+   - NEEDS_CONTEXT → same model (missing data)
+   - BLOCKED → opus (reasoning/complexity)
+   - Opus BLOCKED → always ask user
+
 ## Continuous Standards Discovery
 
 **Philosophy**: Standards are discovered when relevant, not memorized upfront.
@@ -237,14 +267,42 @@ You have access to `.maister/docs/INDEX.md` for continuous standards discovery.
 [See Subagent Output Format section]
 ```
 
+### Re-dispatch on BLOCKED (Model Escalation)
+
+When re-dispatching with opus after BLOCKED:
+
+````markdown
+## Task: Execute Task Group [N] (Escalated)
+
+**Previous attempt status**: BLOCKED
+**Previous attempt explanation**: [paste BLOCKED explanation from subagent]
+**Model**: opus (escalated from sonnet)
+
+### Task Group Content
+[Same as original dispatch]
+
+### Specification Excerpt
+[Same as original dispatch]
+
+### Standards
+[Same as original dispatch]
+
+### Additional Context
+[Any context gathered based on the BLOCKED explanation]
+
+### Requirements
+[Same as original dispatch, plus:]
+5. You are running on a more capable model because the previous attempt was blocked. Use your additional reasoning capability to work through the complexity described above.
+````
+
 ## Subagent Output Format
 
 The task-group-implementer returns structured output:
 
 ```markdown
 ## Group [N] Execution Report
 
-### Status: [SUCCESS/PARTIAL/FAILED]
+### Status: [SUCCESS/SUCCESS_WITH_CONCERNS/PARTIAL/NEEDS_CONTEXT/BLOCKED]
 
 ### Steps Completed
 - [x] N.1 - [description]
@@ -355,22 +413,14 @@ After each task group:
 
 ### Subagent Failure (Mode B)
 
-If task-group-implementer reports failure:
+Subagent status handling is defined in Mode B step 6 above. Additional rules:
 
 1. **Do NOT auto-rollback** - User-confirmed rollback only
-2. **Analyze root cause** from subagent output
-3. **Check for easy fixes**: config issues, missing dependencies, test setup
-4. **Use AskUserQuestion**:
-   ```
-   Question: "Group [N] implementation failed: [brief reason]. How to proceed?"
-   Header: "Failure"
-   Options:
-   - "Try suggested fix" - [if easy fix identified]
-   - "Retry group" - Re-invoke subagent
-   - "Complete manually" - Switch to direct execution for this group
-   - "Rollback changes" - Revert this group's changes
-   - "Stop" - Pause for investigation
-   ```
+2. **Model escalation is automatic** - BLOCKED → opus happens without asking user
+3. **User involvement triggers**:
+   - Opus returns BLOCKED (end of escalation chain)
+   - PARTIAL status with unclear root cause
+   - Max 1 NEEDS_CONTEXT re-dispatch per group (if still NEEDS_CONTEXT after providing context → AskUserQuestion)
 
 ### Test Failure
 
diff --git a/plugins/maister/skills/orchestrator-framework/references/orchestrator-patterns.md b/plugins/maister/skills/orchestrator-framework/references/orchestrator-patterns.md
@@ -324,3 +324,60 @@ If prerequisites missing, use AskUserQuestion: "Start from Phase 1", "Specify di
 | User chooses "Proceed with known issues" | Proceed with warning logged |
 | Max iterations (3) reached | Ask user how to proceed |
 | Critical issues remain unresolved | **MUST NOT proceed** — require user approval first |
+
+---
+
+## 7. Model Escalation Pattern
+
+When a subagent reports BLOCKED status, the coordinator can re-dispatch with a more capable model. This is an automatic escalation — no user confirmation needed for the first tier.
+
+### Escalation Chain
+
+````
+sonnet (default) → BLOCKED → opus → BLOCKED → AskUserQuestion
+````
+
+### Status-to-Action Mapping
+
+| Subagent Status | Action | Model Change |
+|----------------|--------|--------------|
+| SUCCESS / SUCCESS_WITH_CONCERNS | Proceed | None |
+| PARTIAL | Investigate, fix if obvious, ask user if unclear | None |
+| NEEDS_CONTEXT | Provide requested context, re-dispatch | Same model |
+| BLOCKED | Re-dispatch with more capable model | sonnet → opus |
+
+### Key Rules
+
+1. **Never retry same model without changes** — if BLOCKED, something must change (model, context, or task scope)
+2. **NEEDS_CONTEXT ≠ BLOCKED** — missing data → same model; reasoning limit → higher model
+3. **End of chain → user** — when the most capable model is BLOCKED, always AskUserQuestion
+4. **Log escalations** — record in work-log for visibility and cost tracking
+5. **No automatic rollback** — BLOCKED does not mean "undo what was done"
+
+### When to Apply
+
+This pattern applies to any agent that:
+- Has `model: sonnet` in frontmatter (not `inherit` or `opus`)
+- Implements the enriched status protocol (SUCCESS/SUCCESS_WITH_CONCERNS/PARTIAL/NEEDS_CONTEXT/BLOCKED)
+- Is dispatched by a coordinator skill that processes the output
+
+Currently applies to:
+- `task-group-implementer` (dispatched by `implementation-plan-executor`)
+
+### Re-dispatch Prompt Structure
+
+When escalating, the coordinator includes:
+- Original task prompt (unchanged)
+- Previous attempt's BLOCKED explanation
+- Any additional context gathered
+- Note that this is an escalated dispatch with a more capable model
+
+### Anti-Patterns
+
+| Anti-Pattern | Why It's Wrong |
+|--------------|----------------|
+| Retrying same model on BLOCKED | Wastes tokens, same result |
+| Escalating on NEEDS_CONTEXT | Problem is data, not reasoning — provide context first |
+| Escalating on PARTIAL | Subagent made progress — investigate the specific failure |
+| Skipping user when opus is BLOCKED | End of chain, user must decide next step |
+| Auto-rollback on BLOCKED | BLOCKED means "stuck", not "failed" — work may be partially valid |