Skip to content

RLCR: stagnation circuit breaker fires several rounds late; 10 methodology gaps observed #183

@chenzongyao200127

Description

@chenzongyao200127

Context

This issue is filed from a sanitized methodology analysis of
an RLCR session that ran 15 rounds (0-14) of a 42-round budget
before the stagnation circuit breaker terminated it. The session
split into three clean phases:

  • Phase A (rounds 0-5): productive iteration; each round closed
    concrete reviewer-flagged gaps with measurable test coverage.
  • Phase B (rounds 6-9): narrowing returns; each round still closed
    a discrete issue, but the issues were increasingly localized to
    one operator-handoff script.
  • Phase C (rounds 10-14): process churn around an unbreakable
    structural blocker. No code, no tests, no acceptance progress —
    only process artifacts, a host-probe artifact, and an
    audit-quality re-capture of that same probe. Two STALLED
    verdicts and one user "cancel" answer were issued in this phase;
    none broke the loop until the explicit circuit breaker fired at
    round 14.

The honest read: the loop should have ended at round 10. The
four-round delay between the first substantive stagnation signal
and the circuit breaker is exactly the kind of process overhead
the suggestions below are designed to prevent.

Suggested methodology improvements

  1. Hard stagnation gate — an explicit reviewer stagnation
    warning should force the next round into a narrow
    exit-or-escalate objective, not permit another process-state
    round.

  2. Terminal-direction preservation — when a user issues a
    terminal direction (e.g. "cancel") followed by a fallback in
    the same turn, the terminal direction should be preserved as
    the default state if the fallback fails. Today the fallback
    silently supersedes the terminal direction, and the loop never
    returns to honor the original intent if the fallback is
    blocked.

  3. Acknowledged-guardrail-violation circuit breaker — when
    the agent's own round contract explicitly acknowledges that
    the round objective violates the round prompt's guardrails
    because every other option is excluded, the harness should
    treat that acknowledgment as a hard stop. "Permit the
    violation with a note" is the wrong direction of escape.

  4. Defensive-prose ratio metric — per-round measurement of
    the ratio of defensive-justification text ("this round is not
    stagnation / not churn / not self-deferral / ...") to
    concrete-change text, surfaced to the reviewer as a
    churn-candidate signal. Defensive prose volume was a clean
    leading indicator of non-productive rounds in this session.

  5. External-action verdict category — reviewer should be able
    to distinguish "loop cannot close this from here" from "loop
    should try again." Today every NOT COMPLETE verdict is treated
    the same way and the loop produces process artifacts in
    response to directives it cannot execute.

  6. Scope-amendment user option — after N rounds blocked on
    the same gap with no plausible in-loop path, the user-direction
    surfacing protocol should include "amend the blocking
    acceptance criterion" as an offered option. Treating
    immutability as absolute even in the face of architectural
    impossibility forces every round into one of pretend / churn /
    stall.

  7. Directive-plan executability classification — the round
    contract should require explicit classification of each
    numbered step from the previous review's directive plan as
    {executable in this round, blocked by named external factor,
    requires user decision}. If all numbered steps are blocked or
    require user decision, the round should be forced into a
    narrow user-decision objective rather than permitted to
    substitute adjacent work.

  8. Audit-quality-only round prohibition — a round whose
    mainline objective is improving the audit quality of an
    already-captured piece of evidence (without changing any
    factual conclusion) should be disallowed as a mainline. Such
    work belongs in a post-acceptance polish phase or batched
    cleanup at session end.

  9. Frozen-test-count signal — a configurable threshold of
    consecutive rounds with no test-count change AND no code
    change should trigger an automatic stagnation alert. The
    harness appears to use "tests still green" as a proxy for
    "round is healthy," but a frozen test count combined with no
    behavioral changes is itself a stagnation signal.

  10. Session-exit artifact — on any loop termination
    (convergence, circuit breaker, or user cancel), emit a
    session-level summary of what was shipped across the whole
    session, what remains open, and what external action is
    required to close the open work. Distinct from any
    individual round summary.

Cross-cutting observations

  • Review tier was strong but had no "good enough" escape
    valve.
    The reviewer correctly identified real issues every
    round; what it lacked was a way to say "the residual gap is no
    longer the implementation team's problem to close."
  • Contract authoring was load-bearing but inflexible. In the
    productive phase, written round contracts with specific success
    criteria worked extremely well. In the churn phase the
    contracts became increasingly creative about defining
    achievable objectives within constraints, which is the wrong
    direction: the constraints had become incompatible with
    progress, and the contract should have surfaced that
    incompatibility rather than worked around it.
  • Required-ceremony sections expand to fill defensive space.
    The BitLesson-Delta section in every round summary defaulted
    to "none" with extensive justifications, often longer than the
    substantive work of the round — the same defensive-prose
    pattern as suggestion Add CI/CD workflows for shell syntax and version bump checks #4 at a smaller scale.

Acknowledgments

The report was generated by an opus subagent reading round
summaries and review results from a single RLCR session. All
project-specific identifiers (file paths, function names, domain
terms, repository identifiers) were stripped at the analysis
stage. This issue text contains no project-identifying
information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions