From 58b39e213036fe3fad53eb0048e6203755154b94 Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 17:29:20 +0530
Subject: [PATCH 01/13] feat(v1): structural checkers + semantic-context search
 + checker-strength diagnostics
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

WP3/WP4/WP2/WP6 of the v0.11.0 -> V1 strengthening program (research-report driven).

- C3 py-signature:/py-const: structural checkers (dorian/pyast.py): AST-based, close
  the symbol-existence and string/regex comment-survival ceilings; gutted-body remains
  the documented ceiling (only C4 catches a body change behind an unchanged signature).
- C3 code: semantic-context regex over comment/docstring-stripped Python (same ReDoS
  worker-timeout as regex:), so a fact surviving only in a comment/docstring FAILs.
- checker-strength / claim-risk diagnostics (dorian/strength.py): classify truth
  strength per checker, flag kind-vs-strength adequacy mismatches, advisory C4
  zero/constant-assertion lint; surfaced in `dorian bindings` (JSON + human) and the
  opt-in --binding-gate warn output. Advisory only — never a verdict/trust/exit change.
- watch derivation + binding diagnostics recognize the new C3 forms (seal, bindings).
- spec/checkers.md + docs/AGENT_CLAIMS.md document the new grammars and the ceiling.

+58 tests (test_pystructural, test_semantic_context, test_strength); 619 non-slow pass.
ERROR-vs-FAIL discipline preserved; trigger-vs-truth split made explicit, not blurred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 V1_IMPLEMENTATION_TRACKER.md   | 111 +++++++++++++
 docs/AGENT_CLAIMS.md           |   5 +-
 spec/checkers.md               |  44 +++++
 src/dorian/bindings.py         |   5 +-
 src/dorian/checkers/c3_ref.py  |  62 ++++++-
 src/dorian/commands.py         |  33 +++-
 src/dorian/pyast.py            | 293 +++++++++++++++++++++++++++++++++
 src/dorian/seal.py             |  11 +-
 src/dorian/strength.py         | 223 +++++++++++++++++++++++++
 tests/test_pystructural.py     | 290 ++++++++++++++++++++++++++++++++
 tests/test_semantic_context.py | 106 ++++++++++++
 tests/test_strength.py         | 219 ++++++++++++++++++++++++
 12 files changed, 1393 insertions(+), 9 deletions(-)
 create mode 100644 V1_IMPLEMENTATION_TRACKER.md
 create mode 100644 src/dorian/pyast.py
 create mode 100644 src/dorian/strength.py
 create mode 100644 tests/test_pystructural.py
 create mode 100644 tests/test_semantic_context.py
 create mode 100644 tests/test_strength.py

diff --git a/V1_IMPLEMENTATION_TRACKER.md b/V1_IMPLEMENTATION_TRACKER.md
new file mode 100644
index 0000000..3cf4254
--- /dev/null
+++ b/V1_IMPLEMENTATION_TRACKER.md
@@ -0,0 +1,111 @@
+# V1 implementation tracker
+
+Working tracker for the v0.11.0 → V1 strengthening program driven by
+`RESEARCH_REPORT_DORIAN_0_11_0.md`. Behavior is verified against the **current
+code**, not the report; where they disagree, code wins and the disagreement is
+recorded here.
+
+## Phase 0 — version gate + scope evidence
+
+**Version gate: PASSED.**
+
+| Surface | Observed |
+|---|---|
+| `pyproject.toml` `[project].version` | `0.11.0` |
+| `src/dorian/__init__.py` `__version__` | `0.11.0` |
+| branch | `main` |
+| commit SHA (start) | `78dcd1a6a242110e55dc31fd1db2e811de3e3898` |
+| working tree | clean except untracked `.claude/`, `AGENTS.md`, `CLAUDE.md`, `RESEARCH_REPORT_DORIAN_0_11_0.md` |
+| Python | 3.12.4 |
+| toolchain | `uv` 0.5.9; `uv run pytest`; ruff for lint/format |
+| baseline tests | `uv run pytest -m "not slow"` → **561 passed, exit 0**; 636 total incl. slow |
+
+## Phase 1 — baseline reconstruction (from current code)
+
+### Module map
+- `model.py` — `Warrant`/`Claim`/`CheckerSpec`/`ReadSetEntry`, content-addressed id, canonical JSON. `CheckerType = C1|C3|C4|C5` (a *Literal* hint; registry dispatch is on the string `type`).
+- `checkers/base.py` — `run_checker` is the single dispatch + the single execution-policy gate (blocked → `Verdict.ERROR`).
+- `checkers/c1_span.py` — span anchor, relocation-tolerant, optional c2lite.
+- `checkers/c3_ref.py` — `path:` / `symbol:` / `string:` / `regex:`; regex match in a spawn-killed worker (ReDoS backstop).
+- `checkers/c4_test.py` — `pytest:<nodeid>`, careful exit-code mapping; ERROR≠FAIL.
+- `checkers/c5_data.py` — typed data forms + opaque `shell:`.
+- `policy.py` — `ExecutionPolicy`, `executable_kind` (single source of "what executes": C4=pytest, C5 shell=shell).
+- `seal.py` — born-verifiable seal; scope lint; watch derivation; additive symbol-definer widening; duplicate-id reject; atomic write; idempotent re-seal.
+- `revalidate.py` — changed-path discovery, rename persistence, cheapest-first checks (C1<C3<C5<C4), fold, recall fanout; ERROR→ERRORED.
+- `fold.py` — `fold()` pure fn → TRUSTED/DEGRADED/REVOKED/UNKNOWN. (Born state is `WARRANTED`, set at seal.)
+- `bindings.py` — binding diagnostics + opt-in `--binding-gate` (off/warn/fail). Flags: unbacked, single-file, short-literal, ambiguous-mention, trigger-only-symbol, unwatched-mention.
+- `symbol_index.py` — Python symbol→definer index + pyproject console-script index; ambiguity skipped.
+- `gitio.py` — git plumbing incl. `file_at_ref` (needed for trusted-base).
+- `commands.py` / `cli.py` — command surface; exit codes 0/2/3/4/5/6.
+- `store.py` / `blast.py` / `report.py` — derived SQLite, lineage, audit JSONL.
+
+### Trust-boundary map
+- Non-executable: C1, C3, typed C5. Executable: C4 `pytest:`, C5 `shell:`.
+- `--deny-exec`/`--deny-shell` (+ env) are fail-closed, NOT a sandbox. Blocked → ERROR.
+- Sidecars are source of truth; SQLite derived (`sync` rebuilds).
+- Action runs checkers from the **checked-out (head)** sidecars → trusted/internal only today; trusted-base is design-only (`docs/TRUSTED_BASE_ACTION_DESIGN.md`).
+
+### Benchmark/docs freshness map
+- `docs/BENCHMARK_v0.7.0.md` — title-stamped **v0.7.0**, synthetic. HISTORICAL.
+- `docs/BENCHMARK_BINDING_LIFECYCLE.md` — header `dorian 0.9.0`, run_id, 808 pairs. HISTORICAL.
+- `docs/PUBLIC_BENCHMARK_PROTOCOL.md` — protocol only, no results.
+- No current-version (0.11+) result doc exists.
+
+### Report findings verified against code (code wins)
+- **README `WARRANTED -> REVOKED` is NOT drift.** Report (medium-confidence) called it stale. Verified: `fold.fold()` only emits TRUSTED/DEGRADED/REVOKED/UNKNOWN; the *born* trust state is `WARRANTED` (set at seal); the first fold therefore renders `WARRANTED -> <new>`. `tests/test_render_md.py:168-169` pins `WARRANTED -> REVOKED` and `WARRANTED -> UNKNOWN` as correct md output. Action: **do not "fix"; add a short trust-state vocabulary note to remove reader confusion.**
+- **C4 adequacy blind spot** — report marks INFERENCE; confirmed: `c4_test.py` maps pytest exit codes only, no assertion/relevance inspection. Valid advisory target (WP6).
+- **PyPI install wording** — report marks UNVERIFIED. Per project state, dorian is NOT on PyPI; README "until the first PyPI release … install from source" is accurate. Keep.
+
+## Report coverage matrix (every material finding classified)
+
+Categories: IMPL=must-implement · TEST=must-test regression · DOC=must-document · BENCH=must-benchmark · BOUNDARY=honest non-goal · DONE=already in v0.11.0 · DEFER=post-V1/blocked.
+
+| # | Report finding / recommendation | Category | Current evidence | Planned action | Acceptance/verification | Status |
+|---|---|---|---|---|---|---|
+| 1 | README trust-state vocab (WARRANTED vs TRUSTED/…) | DOC | code correct; README lacks a glossary | add trust-state legend; keep examples | docs test + render_md tests stay green | TODO |
+| 2 | ERROR must never collapse into BROKEN | DONE+TEST | base/fold/revalidate all enforce | keep; add a guard test if any new path | existing + new ERROR≠BROKEN tests | TODO |
+| 3 | C1 span + c2lite regression | DONE | test_c1.py | none (keep green) | test_c1 passes | DONE |
+| 4 | C3 regex ReDoS timeout regression | DONE | test_c3_regex_timeout.py (slow) | none | passes | DONE |
+| 5 | C3 symbol existence ceiling / gutted-body | IMPL+DOC | symbol: existence-only | add `py-signature:` structural checker (WP3) | gutted-body PASS under symbol, FAIL under signature when sig changes; body-only stays PASS (documented ceiling) | TODO |
+| 6 | C3 string/regex comment/docstring survival | IMPL+DOC | raw text search | add semantic code-context search mode (WP4) | literal only in comment/docstring → FAIL in code mode | TODO |
+| 7 | C4 pytest vacuous/zero-assertion adequacy | IMPL | none | advisory adequacy lint (WP6) | zero-assertion / assert-True node warns; normal test does not | TODO |
+| 8 | C5 typed grammar limits / snapshot brittleness | BOUNDARY+DOC | documented | document in V1-meaning; optional structural data checker DEFER | doc states grammar bounds | TODO |
+| 9 | duplicate claim-id rejection | DONE | seal.py step 0 | keep | test_seal covers | DONE |
+| 10 | scope-lint named-read-set-only limitation | DONE+DOC | SECURITY_BOUNDARY | keep wording | docs test | DONE |
+| 11 | deny-exec/deny-shell fail-closed, not sandbox | DONE | policy.py, docs | keep | test_deny_exec_policy | DONE |
+| 12 | sidecar source-of-truth vs SQLite derived | DONE | seal/revalidate/sync | keep | test_store/sync | DONE |
+| 13 | canonical JSON / content-addressed identity | DONE | model.compute_id + Warrant.load integrity | keep | test_model/determinism | DONE |
+| 14 | atomic no-write on failed seal | DONE | seal os.replace + refusal order | keep | test_seal/deny_exec | DONE |
+| 15 | changed-path discovery + persisted rename | DONE | revalidate + store rename_log | keep | test_revalidate | DONE |
+| 16 | checker ordering + FAIL vs ERROR discipline | DONE | revalidate _check_claim | keep | existing | DONE |
+| 17 | fold + blast/recall lineage | DONE | fold.py, blast.py | keep | test_fold/test_blast | DONE |
+| 18 | audit/state separate-transaction limitation | BOUNDARY | fold.py docstring documents it | document in V1-meaning as known limitation | doc names it | TODO |
+| 19 | binding ambiguity handling | DONE | symbol_index ambiguous_symbol_mentions + flag | keep; extend provenance (WP5) | test_symbol_index | DONE |
+| 20 | oversized/unparseable file diagnostics | IMPL | silently skipped today | surface multi-index unparse diagnostics (WP5) loudly | giant/unparseable supported file → diagnostic not silent | TODO |
+| 21 | pyproject script binding | DONE | pyproject_script_definers | keep | test_symbol_index | DONE |
+| 22 | watch glob over/under-match risk | TEST | _covered glob logic | add a glob over/under test if WP5 touches it | test | TODO |
+| 23 | public/fork self-attested verdict risk | IMPL+DOC | head-mode only | trusted-base checker-source (WP7) | exploit fixtures: PR-added/modified exec checker not run; non-exec rewrite surfaced | TODO |
+| 24 | trusted-base design + non-sandbox caveat | IMPL+DOC | design-only | implement `--checker-source base` + Action input; keep non-sandbox caveat | WP7 test matrix | TODO |
+| 25 | historical benchmark docs (v0.7.0, v0.9.0) | DOC | unlabeled as historical in body | add HISTORICAL banner; README cross-link labels | docs wording test | TODO |
+| 26 | public benchmark protocol w/o results | DOC | protocol only | keep; note in current-results doc | unchanged | TODO |
+| 27 | current-version benchmark rerun | BENCH | none | rerun + version-stamped `BENCHMARK_CURRENT.md` | bench smoke + stamp present | TODO |
+| 28 | extractor remains draft/experimental | DONE | README + AGENT_CLAIMS | keep; do not promote | docs test | DONE |
+| 29 | release/install-status uncertainty | DOC | README source-install accurate | keep; V1 release report states status | report | TODO |
+| 30 | checker-strength / claim-risk visibility | IMPL | bindings flags exist but no strength score | strength + claim-risk diagnostics (WP2) | behavior+symbol → adequacy-mismatch; unbacked load-bearing → high risk | TODO |
+| 31 | multi-index binding (routes/config/etc.) | IMPL | python+script only | config-key index (WP5), provenance-tagged | config-key change selects claim; ambiguous skipped+warned | TODO |
+| 32 | warrant-quality mutation harness | BENCH | repo-level bench only | `dorian bench warrant-quality` (WP8) | deterministic per-claim trigger/truth score on fixture | TODO |
+
+## Work-package status (live)
+
+| WP | Title | Status |
+|---|---|---|
+| WP1 | docs/evidence hygiene | TODO |
+| WP2 | checker-strength / claim-risk linter | TODO |
+| WP3 | Python structural checkers (py-signature, py-const) | TODO |
+| WP4 | semantic-context source search | TODO |
+| WP5 | multi-index binding (config-key) | TODO |
+| WP6 | C4 test-adequacy lint | TODO |
+| WP7 | trusted-base checker-source mode | TODO |
+| WP8 | warrant-quality mutation harness | TODO |
+| WP9 | current-version benchmark results | TODO |
+| WP10 | V1 release prep / decision | TODO |
diff --git a/docs/AGENT_CLAIMS.md b/docs/AGENT_CLAIMS.md
index 18202e2..ccb4c24 100644
--- a/docs/AGENT_CLAIMS.md
+++ b/docs/AGENT_CLAIMS.md
@@ -120,7 +120,10 @@ ignores `timeout_s`, and a backtracking pattern can stall `revalidate`.
 
 The authoritative grammar is [`spec/checkers.md`](../spec/checkers.md). In brief:
 
-- **C3** — `path:<p>` · `symbol:<file>::<name>` · `string:<file>::<literal>` · `regex:<file>::<pattern>`
+- **C3** — `path:<p>` · `symbol:<file>::<name>` · `string:<file>::<literal>` · `regex:<file>::<pattern>` · `py-signature:<file>::<qualname>::<sigspec>` · `py-const:<file>::<qualname>::<literal>` · `code:<file>::<pattern>`
+  - **`py-signature:`** is stronger than `symbol:` for "function `X` takes args `…`": it compares the parsed signature (names/order/kind always; annotations/defaults/return/async only when you state them), so a parameter rename or default change FAILs where `symbol:` still passes. A body-only change is the documented ceiling — back behavior claims with `C4 pytest:`.
+  - **`py-const:`** is stronger than `regex:` for "`X` is 30": it compares the assignment's literal **value** via the AST, so a comment or docstring mention can never pass and `30`/`0x1E` are equal. A non-literal RHS ERRORs (use a different checker).
+  - **`code:`** is `regex:` over comment/docstring-stripped Python — use it when a `regex:` would false-pass on a fact that survives only in a comment or docstring. Python-only.
 - **C4** — `pytest:<nodeid>` (a nodeid is `file::test`)
 - **C5** — `rowcount:<f>::<op><n>` · `schema:<f>::c1,c2` · `nullrate:<f>::<col>::<op><x>` · `domain:<f>::<col>::{a,b}` · `freshness:<f>::<col>::>= <ISO>` · `snapshot:<f>` · `reconcile:<A>~~<B>` · `shell:<cmd>` (needs explicit `watch` + `expect`)
 - **C1** — a span anchor; its `program` is a read-set entry id. **Not** auto-capturable by `verify`.
diff --git a/spec/checkers.md b/spec/checkers.md
index eb02acf..2ccee4e 100644
--- a/spec/checkers.md
+++ b/spec/checkers.md
@@ -35,6 +35,11 @@ symbol:<file>::<name>          PASS iff \b(def|class)\s+<name>\b matches the fil
 string:<file>::<literal>       PASS iff the literal substring is present
 regex:<file>::<pattern>        PASS iff re.search(pattern, text, re.MULTILINE)
                                hits the LF-normalized file text
+py-signature:<file>::<qualname>::<sigspec>   structural (Python AST): the function
+                               or method has the stated signature
+py-const:<file>::<qualname>::<literal>       structural (Python AST): the module or
+                               class assignment has the stated literal value
+code:<file>::<pattern>         semantic regex over comment/docstring-stripped Python
 ```
 
 The operand of `string:`/`regex:` may itself contain `:`; only the prefix and
@@ -44,6 +49,45 @@ both `TIMEOUT = 30` and `TIMEOUT=30`). When a `string:` check FAILs but a line
 nearly matches, the detail carries a near-miss hint (line number and
 similarity ratio only, never file content) pointing at `regex:`.
 
+### Python structural forms (`py-signature:`, `py-const:`)
+
+`symbol:` proves a name still **exists**; it cannot see a signature change, and
+`string:`/`regex:` search raw text, so a fact surviving only in a comment,
+docstring, or dead literal still passes. The two structural forms parse the
+target's AST (stdlib `ast`, read-only, no execution) and compare structure or
+literal **values**, so they tolerate formatting (whitespace, quote style, integer
+base) and cannot be satisfied by a comment/docstring mention.
+
+- `py-signature:<file>::<qualname>::<sigspec>` — `<qualname>` is a dotted path to a
+  function/method (`verify_token`, `Auth.login`). `<sigspec>` is the parameter list
+  exactly as it would appear inside `def f(...)` — e.g. `token`, `token, algo`,
+  `token: str, algo: str = "RS256"` — optionally suffixed with `-> <ret>` and/or
+  prefixed with `async`. Parameter **names, order, and kind** are always compared;
+  per-parameter **annotations** and **defaults**, the **return annotation**, and
+  **async-ness** are compared **only when the spec states them** (a names-only spec
+  ignores the rest). FAIL on a signature drift or a missing function; ERROR on an
+  unparseable target or a malformed `<sigspec>`.
+- `py-const:<file>::<qualname>::<literal>` — `<qualname>` is a module-level or
+  class-level assignment target (`TIMEOUT`, `C.LIMIT`). Compares the assignment's
+  literal value by `ast.literal_eval`, so `30` matches `0x1E` and `"RS256"` matches
+  `'RS256'`. FAIL on a value drift or a missing constant; **ERROR** when the RHS is
+  not a literal (the value cannot be determined — never a vacuous PASS).
+
+**Documented ceiling:** `py-signature:` is blind to a body-only ("gutted body")
+change — the signature is unchanged, so it PASSes. Only a C4 `pytest:` test catches
+a behavior change behind an unchanged signature. Binding/structure widens *what is
+checked*, never *proves behavior*.
+
+### Semantic-context form (`code:`)
+
+`code:<file>::<pattern>` runs a regex (same 500-char cap, compile guard, and
+worker-process timeout as `regex:`) over a copy of the **Python** file with comments
+and docstrings blanked out. Real string literals (a route path in a dict key, a call
+or decorator argument) are kept, so `code:src/routes.py::/v1/login` matches the route
+in code but a `TIMEOUT = 30` that survives only in a comment FAILs (`code_missing`).
+Python-only: a non-parseable / non-Python target ERRORs (`code_unparseable`), never a
+silent pass. Derived watch: the referenced file.
+
 Regex DoS: `regex:` patterns are length-bounded (500 chars) and compile-guarded,
 AND the match runs in a spawned worker process killed at the checker's
 `timeout_s` (default 30s). A pathological nested-quantifier pattern that triggers
diff --git a/src/dorian/bindings.py b/src/dorian/bindings.py
index 0677e0e..976cb91 100644
--- a/src/dorian/bindings.py
+++ b/src/dorian/bindings.py
@@ -197,7 +197,8 @@ def _checker_named_files(claim: Claim, entry_uris: dict[str, str]) -> set[str]:
     symbol-definer watch paths added at verify time. A watch path NOT in this set is a
     re-check TRIGGER that no checker exercises — the binding fix's trigger != truth gap,
     which the 'trigger-only-symbol' flag surfaces."""
-    from dorian.seal import _c5_data_paths  # lazy: reuse the canonical C5 path grammar
+    # lazy: reuse seal's canonical C3 file-operand form set and C5 path grammar
+    from dorian.seal import _C3_FILE_OPERAND_FORMS, _c5_data_paths
 
     named: set[str] = set()
     for spec in claim.checkers:
@@ -207,7 +208,7 @@ def _checker_named_files(claim: Claim, entry_uris: dict[str, str]) -> set[str]:
             if uri:
                 named.add(uri)
         elif spec.type == "C3":
-            named.add(rest.partition("::")[0] if prefix in ("symbol", "string", "regex") else rest)
+            named.add(rest.partition("::")[0] if prefix in _C3_FILE_OPERAND_FORMS else rest)
         elif spec.type == "C4" and prefix == "pytest":
             named.add(rest.partition("::")[0].strip())  # parity with seal._derive_watch
         elif spec.type == "C5":
diff --git a/src/dorian/checkers/c3_ref.py b/src/dorian/checkers/c3_ref.py
index c2a2e5f..fe06e35 100644
--- a/src/dorian/checkers/c3_ref.py
+++ b/src/dorian/checkers/c3_ref.py
@@ -7,6 +7,21 @@
 - string:<file>::<literal>      PASS iff the literal substring is present.
 - regex:<file>::<pattern>       PASS iff re.search(pattern, text, re.MULTILINE)
                                 hits the LF-normalized file text.
+- py-signature:<file>::<qualname>::<sigspec>   structural (Python AST): the named
+                                function/method has the stated parameters (and, when
+                                given, annotations/defaults/return/async). FAIL on a
+                                signature drift; ERROR on an unparseable target or
+                                malformed spec. Stronger than `symbol:` (which is
+                                existence-only); the body-only "gutted" change is the
+                                documented ceiling — only a C4 test catches that.
+- py-const:<file>::<qualname>::<literal>       structural (Python AST): the named
+                                module/class assignment has the stated LITERAL value
+                                (compared by value, so quote style / int base / spacing
+                                are tolerated, and a comment/docstring mention cannot
+                                pass). FAIL on a value drift; ERROR on a non-literal RHS.
+
+The `py-*` structural forms parse the file's AST (`dorian.pyast`); they read only and
+never execute the target. See `dorian/pyast.py` and `spec/checkers.md`.
 
 `regex:` is the shape-tolerant form: prefer it over `string:` for facts that must
 survive reformatting (the v0.0 false-positive class — e.g. 'TIMEOUT\\s*=\\s*30'
@@ -42,11 +57,16 @@
 import re
 from pathlib import Path
 
+from dorian import pyast
 from dorian._regex_worker import MATCH, NO_MATCH, WORKER_ERROR, search_worker
 from dorian.checkers import registry
 from dorian.checkers.base import CheckContext, CheckResult, Verdict, resolve_path
 from dorian.model import CheckerSpec, lf_normalize
 
+# C3 grammar prefixes. `path` takes a bare path; the rest take `<file>::<operand>`.
+_FILE_OPERAND_FORMS = ("symbol", "string", "regex", "py-signature", "py-const", "code")
+_VERDICT = {"PASS": Verdict.PASS, "FAIL": Verdict.FAIL, "ERROR": Verdict.ERROR}
+
 _MAX_PATTERN_LEN = 500  # cheap guard against catastrophic patterns
 _NEAR_MISS_RATIO = 0.8
 _NEAR_MISS_MAX_FILE_BYTES = 1 << 20  # 1 MiB: bound the per-line scan
@@ -132,7 +152,7 @@ def _string_fail(path: Path, text: str, literal: str) -> CheckResult:
 
 def check(ctx: CheckContext, spec: CheckerSpec) -> CheckResult:
     prefix, sep, rest = spec.program.partition(":")
-    if not sep or prefix not in ("path", "symbol", "string", "regex"):
+    if not sep or (prefix != "path" and prefix not in _FILE_OPERAND_FORMS):
         return CheckResult(Verdict.ERROR, detail="bad_program")
 
     if prefix == "path":
@@ -148,7 +168,7 @@ def check(ctx: CheckContext, spec: CheckerSpec) -> CheckResult:
         return CheckResult(Verdict.ERROR, detail="bad_program")
 
     pattern: re.Pattern[str] | None = None
-    if prefix == "regex":
+    if prefix in ("regex", "code"):  # both are regex over text; same DoS guards
         if len(needle) > _MAX_PATTERN_LEN:
             return CheckResult(Verdict.ERROR, detail="bad_program")
         try:
@@ -168,6 +188,44 @@ def check(ctx: CheckContext, spec: CheckerSpec) -> CheckResult:
             return CheckResult(Verdict.PASS)
         return CheckResult(Verdict.FAIL, detail="symbol_missing")
 
+    if prefix == "py-signature":
+        verdict, detail = pyast.check_signature(text, needle)
+        return CheckResult(_VERDICT[verdict], detail=detail)
+
+    if prefix == "py-const":
+        verdict, detail = pyast.check_const(text, needle)
+        return CheckResult(_VERDICT[verdict], detail=detail)
+
+    if prefix == "code":
+        assert pattern is not None  # compiled above to validate before we spawn
+        code_text = pyast.code_only_python(text)
+        if code_text is None:
+            return CheckResult(
+                Verdict.ERROR,
+                detail="code_unparseable (code: strips comments/docstrings from"
+                " Python; this target is not parseable Python)",
+            )
+        status = _search_with_timeout(needle, re.MULTILINE, code_text, spec.timeout_s)
+        if status == "match":
+            return CheckResult(Verdict.PASS)
+        if status == "nomatch":
+            return CheckResult(
+                Verdict.FAIL,
+                detail="code_missing (not present in code; comments/docstrings ignored)",
+            )
+        if status == "timeout":
+            return CheckResult(
+                Verdict.ERROR,
+                detail=f"regex_timeout (>{spec.timeout_s}s — catastrophic backtracking?)",
+            )
+        if status == "spawn_error":
+            return CheckResult(
+                Verdict.ERROR,
+                detail="regex_spawn_error (regex worker process failed to start;"
+                " an embedder needs a spawn-safe __main__ guard)",
+            )
+        return CheckResult(Verdict.ERROR, detail="regex_error")
+
     if prefix == "regex":
         assert pattern is not None  # compiled above to validate before we spawn
         status = _search_with_timeout(needle, re.MULTILINE, text, spec.timeout_s)
diff --git a/src/dorian/commands.py b/src/dorian/commands.py
index fc0343d..dff6848 100644
--- a/src/dorian/commands.py
+++ b/src/dorian/commands.py
@@ -23,7 +23,7 @@
 from collections import Counter
 from pathlib import Path
 
-from dorian import bindings, claims_io, datachecks, gitio, store, symbol_index
+from dorian import bindings, claims_io, datachecks, gitio, store, strength, symbol_index
 from dorian.blast import blast_conn
 from dorian.capture.manual import parse_manual
 from dorian.capture.transcript import parse_transcript
@@ -94,6 +94,18 @@ def _emit_binding_gate_warnings(prog: str, repo: Path, artifact_uri: str, mode:
         " (weak binding is a review smell, not proof a claim is false)",
         file=sys.stderr,
     )
+    # checker-strength / claim-risk is the truth-axis companion to binding flags:
+    # binding says WHEN a claim re-checks; strength says whether the checker can
+    # falsify it. Advisory only — never changes the seal verdict or exit code.
+    try:
+        claims = list(Warrant.load(repo / (artifact_uri + ".warrant")).claims)
+    except (gitio.GitError, *_SIDECAR_ERRORS):
+        return
+    sdiags = strength.analyze(repo, claims, {d["claim_id"]: d["flags"] for d in diags})
+    print(f"{prog}: {strength.summary_line(sdiags)}", file=sys.stderr)
+    for s in sdiags:
+        for note in s["adequacy"]:
+            print(f"{prog}: {s['claim_id']}: {note}", file=sys.stderr)
 
 
 def _print_binding_gate_refusal(prog: str, exc: BindingGateError) -> None:
@@ -425,6 +437,19 @@ def cmd_bindings(args: argparse.Namespace) -> int:
     except _SIDECAR_ERRORS as exc:
         print(f"dorian bindings: corrupt warrant sidecar: {exc}", file=sys.stderr)
         return EXIT_REVOKED
+    # attach the truth-axis diagnostics (checker strength + claim risk) per claim:
+    # binding flags say WHEN a claim re-checks, strength says whether the checker can
+    # falsify it. Advisory; never a gate (bindings always exits 0 when readable).
+    try:
+        claims = list(Warrant.load(repo / (uri + ".warrant")).claims)
+        sdiags = {
+            s["claim_id"]: s
+            for s in strength.analyze(repo, claims, {d["claim_id"]: d["flags"] for d in diags})
+        }
+    except _SIDECAR_ERRORS:
+        sdiags = {}
+    for d in diags:
+        d["strength"] = sdiags.get(d["claim_id"])
     if args.json:
         print(json.dumps({"artifact_uri": uri, "claims": diags}, sort_keys=True))
         return EXIT_OK
@@ -434,6 +459,12 @@ def cmd_bindings(args: argparse.Namespace) -> int:
         print(f"{d['claim_id']}  flags: {', '.join(d['flags']) or 'none'}")
         for m in d["mentions"]:
             print(f"  {m['token']} -> unwatched: {', '.join(m['unwatched_files'])}")
+        s = d.get("strength")
+        if s:
+            reasons = f" ({', '.join(s['reasons'])})" if s["reasons"] else ""
+            print(f"  strength: {s['strength']}  risk: {s['risk']}{reasons}")
+            for note in s["adequacy"]:
+                print(f"  {note}")
     print(f"{len(diags)} claim(s), {flagged} flagged")
     return EXIT_OK
 
diff --git a/src/dorian/pyast.py b/src/dorian/pyast.py
new file mode 100644
index 0000000..fb2c4b0
--- /dev/null
+++ b/src/dorian/pyast.py
@@ -0,0 +1,293 @@
+"""Deterministic Python structural comparisons over the stdlib `ast` (no execution).
+
+Backs the C3 structural subgrammars `py-signature:` and `py-const:`. Both parse the
+target file's AST and compare *structure* / *literal values*, so they are tolerant
+of formatting (whitespace, quote style, integer base) and cannot be satisfied by a
+mention in a comment, docstring, or dead string literal — the documented weak-verdict
+ceiling of `symbol:`/`string:`/`regex:`.
+
+Each entry point returns ``(verdict, detail)`` where ``verdict`` is ``"PASS"`` /
+``"FAIL"`` / ``"ERROR"`` (the caller maps to ``CheckResult``). The split mirrors the
+checker contract exactly:
+- **FAIL** — a real drift: the function/constant is gone, or its signature/value no
+  longer matches the claim.
+- **ERROR** — the checker could not run: an unparseable target, a malformed program,
+  or a non-literal constant whose value cannot be compared. ERROR is never a vacuous
+  PASS and never a false FAIL.
+
+This module imports only ``ast``; it neither imports nor executes the target.
+"""
+
+from __future__ import annotations
+
+import ast
+import io
+import tokenize
+
+# ast.parse on a pathological (but importable) file can blow these without a
+# SyntaxError; treat them as "could not run", never as drift.
+_PARSE_ERRORS = (SyntaxError, ValueError, RecursionError, MemoryError)
+
+_SCOPE_NODES = (ast.Module, ast.ClassDef, ast.FunctionDef, ast.AsyncFunctionDef)
+
+
+def code_only_python(text: str) -> str | None:
+    """Return `text` with comments and docstrings blanked to spaces (line count and
+    column offsets preserved), or None if it is not parseable Python.
+
+    Real string literals — a route path in a dict key, a call argument, a decorator
+    argument — are KEPT: only ``#`` comments and bare-string docstring statements are
+    removed. This is what makes `code:` reject a fact that survives only in a comment
+    or docstring while still matching the same fact in actual code.
+    """
+    tree = _parse(text)
+    if tree is None:
+        return None
+    doc_start_lines: set[int] = set()
+    for node in ast.walk(tree):
+        if isinstance(node, _SCOPE_NODES):
+            body = getattr(node, "body", None)
+            if (
+                isinstance(body, list)
+                and body
+                and isinstance(body[0], ast.Expr)
+                and isinstance(body[0].value, ast.Constant)
+                and isinstance(body[0].value.value, str)
+            ):
+                doc_start_lines.add(body[0].value.lineno)
+
+    buf = [list(line) for line in text.split("\n")]
+
+    def blank(start: tuple[int, int], end: tuple[int, int]) -> None:
+        (sl, sc), (el, ec) = start, end
+        for ln in range(sl, el + 1):
+            if ln - 1 >= len(buf):
+                break
+            row = buf[ln - 1]
+            lo = sc if ln == sl else 0
+            hi = ec if ln == el else len(row)
+            for i in range(lo, min(hi, len(row))):
+                row[i] = " "
+
+    try:
+        for tok in tokenize.generate_tokens(io.StringIO(text).readline):
+            if tok.type == tokenize.COMMENT or (
+                tok.type == tokenize.STRING and tok.start[0] in doc_start_lines
+            ):
+                blank(tok.start, tok.end)
+    except (tokenize.TokenError, IndentationError, SyntaxError):
+        pass  # ast parsed cleanly; a tokenizer hiccup leaves best-effort blanking
+    return "\n".join("".join(row) for row in buf)
+
+
+def _parse(text: str) -> ast.Module | None:
+    try:
+        return ast.parse(text)
+    except _PARSE_ERRORS:
+        return None
+
+
+def _find_def(tree: ast.Module, qualname: str) -> ast.AST | None:
+    """Resolve a dotted qualname (``A.method``) to its def node, last definition
+    winning (runtime rebinding semantics). Descends ClassDef/FunctionDef bodies."""
+    node: ast.AST = tree
+    parts = qualname.split(".")
+    for part in parts:
+        body = getattr(node, "body", None)
+        if not isinstance(body, list):
+            return None
+        match: ast.AST | None = None
+        for child in body:
+            if (
+                isinstance(child, ast.FunctionDef | ast.AsyncFunctionDef | ast.ClassDef)
+                and child.name == part
+            ):
+                match = child  # last wins
+        if match is None:
+            return None
+        node = match
+    return node
+
+
+def _find_assign(tree: ast.Module, qualname: str) -> ast.expr | None:
+    """Resolve a dotted constant name (``C.LIMIT``) to its assigned RHS value node,
+    last assignment winning. Walks into ClassDef containers for the dotted prefix."""
+    parts = qualname.split(".")
+    *containers, name = parts
+    if not name:
+        return None
+    node: ast.AST = tree
+    for part in containers:
+        body = getattr(node, "body", None)
+        if not isinstance(body, list):
+            return None
+        match: ast.ClassDef | None = None
+        for child in body:
+            if isinstance(child, ast.ClassDef) and child.name == part:
+                match = child
+        if match is None:
+            return None
+        node = match
+    body = getattr(node, "body", None)
+    if not isinstance(body, list):
+        return None
+    found: ast.expr | None = None
+    for stmt in body:
+        if isinstance(stmt, ast.Assign):
+            for tgt in stmt.targets:
+                if isinstance(tgt, ast.Name) and tgt.id == name:
+                    found = stmt.value
+        elif (
+            isinstance(stmt, ast.AnnAssign)
+            and isinstance(stmt.target, ast.Name)
+            and stmt.target.id == name
+            and stmt.value is not None
+        ):
+            found = stmt.value
+    return found
+
+
+def _params(fn: ast.FunctionDef | ast.AsyncFunctionDef) -> list[dict]:
+    """Normalized parameter list: name + kind, plus annotation/default source reprs
+    (via ast.unparse, so spacing/quote style/int base are canonical)."""
+    a = fn.args
+    out: list[dict] = []
+    pos = a.posonlyargs + a.args
+    dmap = {
+        arg.arg: ast.unparse(d)
+        for arg, d in zip(pos[len(pos) - len(a.defaults) :], a.defaults, strict=False)
+    }
+    kwd = {
+        arg.arg: ast.unparse(d)
+        for arg, d in zip(a.kwonlyargs, a.kw_defaults, strict=False)
+        if d is not None
+    }
+
+    def emit(args: list[ast.arg], kind: str, defaults: dict[str, str]) -> None:
+        for arg in args:
+            out.append(
+                {
+                    "name": arg.arg,
+                    "kind": kind,
+                    "annotation": ast.unparse(arg.annotation) if arg.annotation else None,
+                    "default": defaults.get(arg.arg),
+                }
+            )
+
+    emit(a.posonlyargs, "posonly", dmap)
+    emit(a.args, "arg", dmap)
+    if a.vararg:
+        out.append(
+            {
+                "name": a.vararg.arg,
+                "kind": "vararg",
+                "annotation": ast.unparse(a.vararg.annotation) if a.vararg.annotation else None,
+                "default": None,
+            }
+        )
+    emit(a.kwonlyargs, "kwonly", kwd)
+    if a.kwarg:
+        out.append(
+            {
+                "name": a.kwarg.arg,
+                "kind": "kwarg",
+                "annotation": ast.unparse(a.kwarg.annotation) if a.kwarg.annotation else None,
+                "default": None,
+            }
+        )
+    return out
+
+
+def _compare_params(expected: list[dict], actual: list[dict]) -> str | None:
+    """Compare names/kinds/order always; annotations and defaults ONLY where the
+    expected spec provided them (so a names-only spec ignores annotations)."""
+    if len(expected) != len(actual):
+        return f"param count {len(actual)} != expected {len(expected)}"
+    for e, a in zip(expected, actual, strict=True):
+        if e["name"] != a["name"] or e["kind"] != a["kind"]:
+            return f"param {a['name']!r} ({a['kind']}) != expected {e['name']!r} ({e['kind']})"
+        if e["annotation"] is not None and e["annotation"] != a["annotation"]:
+            return (
+                f"annotation of {e['name']!r}: {a['annotation']!r} != expected {e['annotation']!r}"
+            )
+        if e["default"] is not None and e["default"] != a["default"]:
+            return f"default of {e['name']!r}: {a['default']!r} != expected {e['default']!r}"
+    return None
+
+
+def check_signature(text: str, needle: str) -> tuple[str, str]:
+    """``needle`` is ``<qualname>::<sigspec>``. ``sigspec`` is the parameter list as
+    written inside ``def f(...)`` (optionally ``-> ret``, optionally a leading
+    ``async``). Compares names/kinds/order always; annotations, defaults, the return
+    annotation, and async-ness only when the spec states them."""
+    qualname, sep, spec = needle.partition("::")
+    qualname, spec = qualname.strip(), spec.strip()
+    if not sep or not qualname:
+        return ("ERROR", "bad_program: py-signature needs <qualname>::<sigspec>")
+
+    async_required = False
+    if spec == "async" or spec.startswith("async "):
+        async_required = True
+        spec = spec[len("async") :].strip()
+    if " -> " in spec:
+        param_src, ret = spec.split(" -> ", 1)
+        arrow = f" -> {ret.strip()}"
+    else:
+        param_src, arrow = spec, ""
+    try:
+        probe = ast.parse(f"def __dorian_probe__({param_src}){arrow}: pass")
+    except _PARSE_ERRORS:
+        return ("ERROR", f"bad_program: cannot parse expected signature {spec!r}")
+    pfn = probe.body[0]
+    if not isinstance(pfn, ast.FunctionDef):
+        return ("ERROR", f"bad_program: cannot parse expected signature {spec!r}")
+
+    tree = _parse(text)
+    if tree is None:
+        return ("ERROR", "target_unparseable: not parseable python")
+    fn = _find_def(tree, qualname)
+    if not isinstance(fn, ast.FunctionDef | ast.AsyncFunctionDef):
+        return ("FAIL", f"function_missing: {qualname}")
+
+    if async_required and not isinstance(fn, ast.AsyncFunctionDef):
+        return ("FAIL", f"signature_mismatch: {qualname} is not async")
+    mismatch = _compare_params(_params(pfn), _params(fn))
+    if mismatch:
+        return ("FAIL", f"signature_mismatch: {qualname}: {mismatch}")
+    if arrow:
+        want_ret = ast.unparse(pfn.returns) if pfn.returns else None
+        got_ret = ast.unparse(fn.returns) if fn.returns else None
+        if want_ret != got_ret:
+            return ("FAIL", f"signature_mismatch: {qualname}: return {got_ret!r} != {want_ret!r}")
+    return ("PASS", f"signature ok: {qualname}")
+
+
+def check_const(text: str, needle: str) -> tuple[str, str]:
+    """``needle`` is ``<qualname>::<literal>``. Compares the assignment's literal
+    VALUE (via ``ast.literal_eval``), so quote style / int base / spacing are
+    tolerated and a comment/docstring mention cannot pass. A non-literal RHS ERRORs
+    (the value cannot be determined), never a vacuous PASS."""
+    qualname, sep, expected = needle.partition("::")
+    qualname, expected = qualname.strip(), expected.strip()
+    if not sep or not qualname:
+        return ("ERROR", "bad_program: py-const needs <qualname>::<value>")
+    if not expected:
+        return ("ERROR", "bad_program: py-const needs an expected value")
+    try:
+        want = ast.literal_eval(expected)
+    except _PARSE_ERRORS:
+        return ("ERROR", f"bad_program: expected value is not a python literal: {expected!r}")
+
+    tree = _parse(text)
+    if tree is None:
+        return ("ERROR", "target_unparseable: not parseable python")
+    rhs = _find_assign(tree, qualname)
+    if rhs is None:
+        return ("FAIL", f"const_missing: {qualname}")
+    try:
+        got = ast.literal_eval(rhs)
+    except _PARSE_ERRORS:
+        return ("ERROR", f"non_literal: {qualname} is not a literal constant")
+    if got == want:
+        return ("PASS", f"const ok: {qualname} == {expected}")
+    return ("FAIL", f"const_value_mismatch: {qualname} != {expected}")
diff --git a/src/dorian/seal.py b/src/dorian/seal.py
index 9954e12..d1384d6 100644
--- a/src/dorian/seal.py
+++ b/src/dorian/seal.py
@@ -37,6 +37,11 @@
 )
 from dorian.policy import ExecutionPolicy
 
+# C3 prefixes whose program is `<file>::<operand>` (so the watched file is the head
+# before `::`); `path:` is the exception (its whole operand is the path). Mirrors
+# `c3_ref._FILE_OPERAND_FORMS` — kept in sync so a new C3 subgrammar binds its file.
+_C3_FILE_OPERAND_FORMS = ("symbol", "string", "regex", "py-signature", "py-const", "code")
+
 
 class SealError(Exception):
     """Sealing refused: bad bindings or a checker that is not green right now."""
@@ -116,9 +121,9 @@ def _derive_watch(spec: CheckerSpec, readset: ReadSet) -> CheckerSpec:
         entry = next((e for e in readset.entries if e.id == spec.program), None)
         if entry is not None:
             watch = (entry.uri,)
-    elif spec.type == "C3":  # path:<p> | (symbol|string|regex):<file>::<operand>
+    elif spec.type == "C3":  # path:<p> | (symbol|string|regex|py-*|code):<file>::<operand>
         prefix, _, rest = spec.program.partition(":")
-        file = rest.partition("::")[0] if prefix in ("symbol", "string", "regex") else rest
+        file = rest.partition("::")[0] if prefix in _C3_FILE_OPERAND_FORMS else rest
         if file:
             watch = (file,)
     elif spec.type == "C4":  # pytest:<nodeid>: the nodeid's file part is the binding
@@ -191,7 +196,7 @@ def add(p: str) -> None:
                 )
             prefix, _, rest = spec.program.partition(":")
             if spec.type == "C3":
-                add(rest.partition("::")[0] if prefix in ("symbol", "string", "regex") else rest)
+                add(rest.partition("::")[0] if prefix in _C3_FILE_OPERAND_FORMS else rest)
             elif spec.type == "C4":
                 if prefix == "pytest":  # match _derive_watch; other C4 forms ERROR at seal
                     add(rest.partition("::")[0])
diff --git a/src/dorian/strength.py b/src/dorian/strength.py
new file mode 100644
index 0000000..21f35dc
--- /dev/null
+++ b/src/dorian/strength.py
@@ -0,0 +1,223 @@
+"""Checker-strength and claim-risk diagnostics — make TRUTH strength visible.
+
+The protocol keeps two questions apart (``docs/VALIDATION_HONESTY.md``):
+- **Trigger coverage** — WHEN a claim is re-checked (binding; ``bindings.py``).
+- **Truth strength** — WHETHER a checker can actually FALSIFY the claim (here).
+
+A green seal says every backed claim held at seal time; it does NOT say the checker
+is strong enough to catch a future drift. This module scores that second axis:
+
+1. classify each checker's truth strength (``existence`` < ``raw_text`` <
+   ``semantic_text`` < ``snapshot`` < ``data`` < ``structural`` < ``behavioral``;
+   ``shell_executable`` is opaque), and the strongest backing per claim;
+2. flag an ``adequacy_mismatch`` when the claim's ``kind`` needs more than its
+   checkers provide (a ``behavior`` claim with only an existence/text checker; a
+   ``quantity`` claim with only an existence checker);
+3. run an advisory C4 test-adequacy lint (a bound pytest node with no assertions, or
+   only a constant assertion, passes vacuously);
+4. roll those into a per-claim ``risk`` level (``high``/``medium``/``low``).
+
+It is purely advisory and deterministic: it never executes a checker, never changes
+a verdict, trust state, or exit code, and reports repo-relative facts only. C4
+adequacy parses the test file's AST (read-only); it never runs the test.
+"""
+
+from __future__ import annotations
+
+import ast
+from collections.abc import Sequence
+from pathlib import Path
+
+from dorian import pyast
+from dorian.model import CheckerSpec, Claim
+from dorian.policy import executable_kind
+
+# Truth-strength rank (higher = can falsify more). `shell_executable` is opaque —
+# it may be strong or vacuous — so it is ranked low and flagged, never trusted as
+# the strongest backing on reputation alone.
+_RANK = {
+    "unbacked": 0,
+    "shell_executable": 1,
+    "existence": 2,
+    "raw_text": 3,
+    "semantic_text": 4,
+    "snapshot": 5,
+    "data": 6,
+    "structural": 7,
+    "behavioral": 8,
+}
+
+_C3_STRENGTH = {
+    "path": "existence",
+    "symbol": "existence",
+    "string": "raw_text",
+    "regex": "raw_text",
+    "code": "semantic_text",
+    "py-signature": "structural",
+    "py-const": "structural",
+}
+
+# claim kinds and the WEAK strengths that under-verify them
+_WEAK_FOR_BEHAVIOR = {"existence", "raw_text", "semantic_text", "snapshot", "data"}
+
+
+def checker_strength(spec: CheckerSpec) -> str:
+    """Classify a single checker's truth strength (see module docstring)."""
+    if spec.type == "C1":
+        return "snapshot"  # exact span-hash: snapshot-grade content match
+    if spec.type == "C4":
+        return "behavioral"
+    if spec.type == "C5":
+        form = spec.program.partition(":")[0]
+        if form == "shell":
+            return "shell_executable"
+        if form == "snapshot":
+            return "snapshot"
+        return "data"  # rowcount/schema/nullrate/domain/freshness/reconcile
+    if spec.type == "C3":
+        return _C3_STRENGTH.get(spec.program.partition(":")[0], "raw_text")
+    return "raw_text"  # unknown checker type: conservative
+
+
+def claim_strength(claim: Claim) -> str:
+    """The strongest backing across a claim's checkers; ``unbacked`` if none."""
+    if not claim.checkers:
+        return "unbacked"
+    return max((checker_strength(s) for s in claim.checkers), key=lambda s: _RANK.get(s, 0))
+
+
+def c4_adequacy(repo: Path, spec: CheckerSpec) -> list[str]:
+    """Advisory: does a C4 ``pytest:`` node actually assert anything? Parses the test
+    file's AST (no execution). Returns at most one note. Silent (``[]``) when the
+    node cannot be located statically (the checker itself reports ``test_gone``) or
+    when assertions / assertion helpers / ``pytest.raises`` are present."""
+    prefix, _, nodeid = spec.program.partition(":")
+    if prefix != "pytest":
+        return []
+    file, _, rest = nodeid.partition("::")
+    if not file.strip() or not rest.strip():
+        return []  # whole-file node or malformed: not the linter's call
+    path = repo / file.strip()
+    if not path.is_file():
+        return []
+    try:
+        text = path.read_text(encoding="utf-8", errors="replace")
+    except OSError:
+        return []
+    tree = pyast._parse(text)
+    if tree is None:
+        return []
+    fn = pyast._find_def(tree, rest.strip().replace("::", "."))
+    if not isinstance(fn, ast.FunctionDef | ast.AsyncFunctionDef):
+        return []  # not found by static walk: do not guess
+    asserts = [n for n in ast.walk(fn) if isinstance(n, ast.Assert)]
+    if asserts:
+        if all(isinstance(a.test, ast.Constant) and bool(a.test.value) for a in asserts):
+            return [f"c4_adequacy: {rest.strip()} asserts only a constant (vacuous)"]
+        return []
+    if _has_assertion_helper(fn):
+        return []
+    return [f"c4_adequacy: {rest.strip()} has no assertions (may pass vacuously; low confidence)"]
+
+
+def _has_assertion_helper(fn: ast.AST) -> bool:
+    """unittest ``self.assert*`` calls or a ``pytest.raises`` / bare ``raises`` context
+    count as assertions for the adequacy lint (conservative: avoid false warnings)."""
+    for node in ast.walk(fn):
+        if isinstance(node, ast.Attribute) and node.attr.startswith("assert"):
+            return True
+        if isinstance(node, ast.Call):
+            f = node.func
+            if isinstance(f, ast.Attribute) and f.attr == "raises":
+                return True
+            if isinstance(f, ast.Name) and f.id == "raises":
+                return True
+    return False
+
+
+def adequacy_notes(repo: Path, claim: Claim) -> list[str]:
+    """Kind-vs-strength mismatches plus C4 adequacy notes for one claim."""
+    if not claim.checkers:
+        return []
+    notes: list[str] = []
+    strongest = claim_strength(claim)
+    behavioral_backed = any(checker_strength(s) == "behavioral" for s in claim.checkers)
+    if claim.kind == "behavior" and not behavioral_backed and strongest in _WEAK_FOR_BEHAVIOR:
+        notes.append(
+            f"adequacy_mismatch: 'behavior' claim backed only by {strongest}"
+            " — only a C4 pytest checker proves behavior"
+        )
+    if claim.kind == "quantity" and all(checker_strength(s) == "existence" for s in claim.checkers):
+        notes.append(
+            "adequacy_mismatch: 'quantity' claim backed only by an existence checker"
+            " — use py-const:/anchored regex:/typed C5 to verify the value"
+        )
+    for spec in claim.checkers:
+        if spec.type == "C4":
+            notes.extend(c4_adequacy(repo, spec))
+    return notes
+
+
+def claim_risk(
+    claim: Claim, flags: Sequence[str], adequacy: Sequence[str]
+) -> tuple[str, list[str]]:
+    """Roll strength + binding flags + adequacy into a level + reasons. Deterministic.
+    Non-load-bearing claims never score ``high`` (a soft claim is the author's call)."""
+    reasons: list[str] = []
+    level = "low"
+    if not claim.checkers:
+        reasons.append("unbacked")
+        level = "high" if claim.load_bearing else "low"
+    else:
+        strongest = claim_strength(claim)
+        if adequacy:
+            reasons.append("adequacy_mismatch")
+            if claim.load_bearing:
+                level = "high" if strongest in ("existence", "raw_text") else "medium"
+    high_binding = {
+        "short-literal",
+        "ambiguous-mention",
+        "trigger-only-symbol",
+        "unwatched-mention",
+    }
+    for f in flags:
+        if f in high_binding:
+            reasons.append(f"binding:{f}")
+            if claim.load_bearing and level == "low":
+                level = "medium"
+    return level, reasons
+
+
+def analyze(
+    repo: Path,
+    claims: Sequence[Claim],
+    flags_by_id: dict[str, Sequence[str]] | None = None,
+) -> list[dict]:
+    """Per-claim strength/risk diagnostics, in claim order. ``flags_by_id`` (from
+    ``bindings.analyze``) feeds binding-flag risk reasons when available."""
+    flags_by_id = flags_by_id or {}
+    out: list[dict] = []
+    for c in claims:
+        adequacy = adequacy_notes(repo, c)
+        level, reasons = claim_risk(c, flags_by_id.get(c.id, ()), adequacy)
+        out.append(
+            {
+                "claim_id": c.id,
+                "kind": c.kind,
+                "load_bearing": c.load_bearing,
+                "strength": claim_strength(c),
+                "executes": sorted({k for s in c.checkers if (k := executable_kind(s))}),
+                "adequacy": adequacy,
+                "risk": level,
+                "reasons": reasons,
+            }
+        )
+    return out
+
+
+def summary_line(diags: list[dict]) -> str:
+    """One deterministic line: risk-level counts. For CLI summaries."""
+    counts = {"high": 0, "medium": 0, "low": 0}
+    for d in diags:
+        counts[d["risk"]] = counts.get(d["risk"], 0) + 1
+    return f"claim-risk: {counts['high']} high, {counts['medium']} medium, {counts['low']} low"
diff --git a/tests/test_pystructural.py b/tests/test_pystructural.py
new file mode 100644
index 0000000..76976e4
--- /dev/null
+++ b/tests/test_pystructural.py
@@ -0,0 +1,290 @@
+"""C3 Python structural checkers: `py-signature:` and `py-const:`.
+
+These close two documented weak-verdict ceilings of `symbol:`/`string:`/`regex:`:
+- `symbol:` proves a name still EXISTS; it cannot see a signature change.
+- `string:`/`regex:` search raw file text, so a fact surviving only in a comment,
+  docstring, or dead literal still passes.
+
+Both new forms parse the target file's AST (stdlib `ast`), so:
+- they are tolerant of formatting (whitespace, quote style, int base) because the
+  comparison is over parsed structure / literal VALUES, never raw text;
+- they cannot be fooled by a mention in a comment or docstring (the AST has no
+  such node for the claimed symbol);
+- they FAIL on a real structural/value drift, and ERROR (never FAIL) when the
+  checker itself cannot run (unparseable target, malformed program, non-literal
+  constant), so a degenerate program never produces a vacuous PASS;
+- they remain READ-ONLY and never execute user code.
+
+The honest ceiling stays visible: a body-only ("gutted body") change keeps the
+signature identical, so `py-signature:` PASSes — only a behavior checker (C4) catches
+that. The test for it asserts PASS and the docs state the ceiling.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from dorian.checkers.base import CheckContext, Verdict, run_checker
+from dorian.model import CheckerSpec, Claim
+
+
+def _run(repo: Path, program: str) -> object:
+    claim = Claim(
+        id="c",
+        text="x",
+        kind="behavior",
+        load_bearing=False,
+        checkers=(CheckerSpec(type="C3", program=program),),
+    )
+    return run_checker(CheckContext(repo=repo, claim=claim), 0)
+
+
+def _w(repo: Path, rel: str, content: str) -> None:
+    p = repo / rel
+    p.parent.mkdir(parents=True, exist_ok=True)
+    p.write_text(content, encoding="utf-8")
+
+
+# --- py-signature -------------------------------------------------------------
+
+
+AUTH = '''"""mod."""
+
+
+def verify_token(token: str) -> bool:
+    """Verify an RS256 JWT."""
+    return bool(token)
+'''
+
+
+def test_py_signature_unchanged_passes(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", AUTH)
+    assert (
+        _run(tmp_path, "py-signature:m.py::verify_token::token: str -> bool").verdict
+        is Verdict.PASS
+    )
+
+
+def test_py_signature_names_only_ignores_unspecified_annotations(tmp_path: Path) -> None:
+    """A spec that lists only param names compares only names/order/kind — the
+    annotation and return are NOT specified, so they are not compared."""
+    _w(tmp_path, "m.py", AUTH)
+    assert _run(tmp_path, "py-signature:m.py::verify_token::token").verdict is Verdict.PASS
+
+
+def test_py_signature_param_rename_fails(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", AUTH)
+    res = _run(tmp_path, "py-signature:m.py::verify_token::tok")
+    assert res.verdict is Verdict.FAIL
+    assert "signature" in res.detail
+
+
+def test_py_signature_param_count_and_order_fail(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", "def f(a, b):\n    return a\n")
+    assert _run(tmp_path, "py-signature:m.py::f::a").verdict is Verdict.FAIL  # missing b
+    assert _run(tmp_path, "py-signature:m.py::f::b, a").verdict is Verdict.FAIL  # reordered
+    assert _run(tmp_path, "py-signature:m.py::f::a, b").verdict is Verdict.PASS
+
+
+def test_py_signature_default_change(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", "def f(x=1):\n    return x\n")
+    assert _run(tmp_path, "py-signature:m.py::f::x=2").verdict is Verdict.FAIL
+    assert _run(tmp_path, "py-signature:m.py::f::x=1").verdict is Verdict.PASS
+    # a spec that omits the default does not compare it (only names checked)
+    assert _run(tmp_path, "py-signature:m.py::f::x").verdict is Verdict.PASS
+
+
+def test_py_signature_return_annotation_change(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", AUTH)
+    assert _run(tmp_path, "py-signature:m.py::verify_token::token -> int").verdict is Verdict.FAIL
+    assert _run(tmp_path, "py-signature:m.py::verify_token::token -> bool").verdict is Verdict.PASS
+
+
+def test_py_signature_formatting_only_passes(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", "def f( x ,y ,  z ):\n    return x\n")
+    assert _run(tmp_path, "py-signature:m.py::f::x, y, z").verdict is Verdict.PASS
+
+
+def test_py_signature_gutted_body_still_passes_documented_ceiling(tmp_path: Path) -> None:
+    """The signature is unchanged but the body is inverted: py-signature PASSes.
+    This is the documented trigger-vs-truth ceiling — only a behavior checker (C4)
+    catches a body-only change. The test pins the ceiling so it cannot regress
+    into a silent over-promise."""
+    _w(tmp_path, "m.py", "def is_admin(user):\n    return user.role == 'admin'\n")
+    before = _run(tmp_path, "py-signature:m.py::is_admin::user")
+    _w(tmp_path, "m.py", "def is_admin(user):\n    return True  # gutted\n")
+    after = _run(tmp_path, "py-signature:m.py::is_admin::user")
+    assert before.verdict is Verdict.PASS and after.verdict is Verdict.PASS
+
+
+def test_py_signature_missing_function_fails(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", AUTH)
+    res = _run(tmp_path, "py-signature:m.py::nope::x")
+    assert res.verdict is Verdict.FAIL
+    assert "missing" in res.detail
+
+
+def test_py_signature_async_flag(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", "async def fetch(url):\n    return url\n")
+    assert _run(tmp_path, "py-signature:m.py::fetch::async url").verdict is Verdict.PASS
+    # async not specified -> not compared (sync/async both accepted)
+    assert _run(tmp_path, "py-signature:m.py::fetch::url").verdict is Verdict.PASS
+    # require async on a sync function -> FAIL
+    _w(tmp_path, "m.py", "def fetch(url):\n    return url\n")
+    assert _run(tmp_path, "py-signature:m.py::fetch::async url").verdict is Verdict.FAIL
+
+
+def test_py_signature_method_via_dotted_qualname(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", "class A:\n    def login(self, user, pw):\n        return True\n")
+    assert _run(tmp_path, "py-signature:m.py::A.login::self, user, pw").verdict is Verdict.PASS
+    assert _run(tmp_path, "py-signature:m.py::A.login::self, user").verdict is Verdict.FAIL
+
+
+def test_py_signature_unparseable_target_is_error(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", "def f(:::\n")  # syntax error
+    assert _run(tmp_path, "py-signature:m.py::f::x").verdict is Verdict.ERROR
+
+
+def test_py_signature_bad_spec_is_error(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", AUTH)
+    assert _run(tmp_path, "py-signature:m.py::verify_token::((bad").verdict is Verdict.ERROR
+    # empty needle (no qualname) is a bad program
+    assert _run(tmp_path, "py-signature:m.py::").verdict is Verdict.ERROR
+
+
+def test_py_signature_missing_file_is_fail(tmp_path: Path) -> None:
+    assert _run(tmp_path, "py-signature:gone.py::f::x").verdict is Verdict.FAIL
+
+
+def test_py_signature_path_escape_is_error(tmp_path: Path) -> None:
+    assert _run(tmp_path, "py-signature:../../etc/passwd::f::x").verdict is Verdict.ERROR
+
+
+def test_py_signature_comment_cannot_create_false_pass(tmp_path: Path) -> None:
+    """A function that exists only as text inside a comment is not in the AST."""
+    _w(tmp_path, "m.py", "# def ghost(a, b): pass\nVALUE = 1\n")
+    assert _run(tmp_path, "py-signature:m.py::ghost::a, b").verdict is Verdict.FAIL
+
+
+# --- py-const -----------------------------------------------------------------
+
+CONFIG = 'TIMEOUT = 30\nRETRIES = 3\nALGO = "RS256"\nPORT: int = 8080\n'
+
+
+def test_py_const_unchanged_passes(tmp_path: Path) -> None:
+    _w(tmp_path, "c.py", CONFIG)
+    assert _run(tmp_path, "py-const:c.py::TIMEOUT::30").verdict is Verdict.PASS
+
+
+def test_py_const_value_change_fails(tmp_path: Path) -> None:
+    _w(tmp_path, "c.py", CONFIG.replace("TIMEOUT = 30", "TIMEOUT = 10"))
+    res = _run(tmp_path, "py-const:c.py::TIMEOUT::30")
+    assert res.verdict is Verdict.FAIL
+    assert "value" in res.detail or "const" in res.detail
+
+
+def test_py_const_formatting_and_base_tolerant(tmp_path: Path) -> None:
+    _w(tmp_path, "c.py", "TIMEOUT  =  0x1E\n")  # hex 30, extra spaces
+    assert _run(tmp_path, "py-const:c.py::TIMEOUT::30").verdict is Verdict.PASS
+
+
+def test_py_const_string_quote_tolerant(tmp_path: Path) -> None:
+    _w(tmp_path, "c.py", CONFIG)
+    assert _run(tmp_path, 'py-const:c.py::ALGO::"RS256"').verdict is Verdict.PASS
+    assert _run(tmp_path, "py-const:c.py::ALGO::'RS256'").verdict is Verdict.PASS
+    assert _run(tmp_path, 'py-const:c.py::ALGO::"HS256"').verdict is Verdict.FAIL
+
+
+def test_py_const_annassign(tmp_path: Path) -> None:
+    _w(tmp_path, "c.py", CONFIG)
+    assert _run(tmp_path, "py-const:c.py::PORT::8080").verdict is Verdict.PASS
+    assert _run(tmp_path, "py-const:c.py::PORT::9090").verdict is Verdict.FAIL
+
+
+def test_py_const_class_attribute_via_dotted(tmp_path: Path) -> None:
+    _w(tmp_path, "c.py", "class C:\n    LIMIT = 5\n")
+    assert _run(tmp_path, "py-const:c.py::C.LIMIT::5").verdict is Verdict.PASS
+    assert _run(tmp_path, "py-const:c.py::C.LIMIT::6").verdict is Verdict.FAIL
+
+
+def test_py_const_missing_is_fail(tmp_path: Path) -> None:
+    _w(tmp_path, "c.py", CONFIG)
+    res = _run(tmp_path, "py-const:c.py::NOPE::1")
+    assert res.verdict is Verdict.FAIL
+    assert "missing" in res.detail
+
+
+def test_py_const_non_literal_rhs_is_error(tmp_path: Path) -> None:
+    """A non-literal RHS cannot be compared to a literal value: ERROR (the checker
+    cannot determine the value), never a vacuous PASS or a false FAIL."""
+    _w(tmp_path, "c.py", "TIMEOUT = compute_timeout()\n")
+    assert _run(tmp_path, "py-const:c.py::TIMEOUT::30").verdict is Verdict.ERROR
+
+
+def test_py_const_bad_expected_value_is_error(tmp_path: Path) -> None:
+    _w(tmp_path, "c.py", CONFIG)
+    assert _run(tmp_path, "py-const:c.py::TIMEOUT::not-a-literal(").verdict is Verdict.ERROR
+
+
+def test_py_const_comment_and_docstring_survival_does_not_pass(tmp_path: Path) -> None:
+    """The value surviving only in a comment or docstring is not an assignment."""
+    _w(tmp_path, "c.py", '"""TIMEOUT = 30 in the docstring."""\n# TIMEOUT = 30\nOTHER = 1\n')
+    assert _run(tmp_path, "py-const:c.py::TIMEOUT::30").verdict is Verdict.FAIL
+
+
+# --- end-to-end: the new forms bind, seal born-verifiable, and re-check ---------
+
+
+def test_structural_forms_verify_seal_and_revalidate(fixture_repo: Path) -> None:
+    """A py-signature, a py-const, and a code: claim auto-capture their file, seal
+    born-verifiable (all hold now), and on the canonical drift commit:
+    - the renamed-but-unchanged signature stays VERIFIED (rename-resolved),
+    - the changed constant and the removed route fold BROKEN -> REVOKED (exit 4)."""
+    import json
+
+    from conftest import apply_three_change_commit, git
+    from dorian import cli
+
+    claims = {
+        "claims": [
+            {
+                "id": "vt-sig",
+                "text": "verify_token(token) is defined in src/auth.py.",
+                "kind": "behavior",
+                "load_bearing": True,
+                "checkers": [
+                    {
+                        "type": "C3",
+                        "program": "py-signature:src/auth.py::verify_token::token: str -> bool",
+                    }
+                ],
+            },
+            {
+                "id": "timeout-30",
+                "text": "The default request timeout is 30 seconds.",
+                "kind": "quantity",
+                "load_bearing": True,
+                "checkers": [{"type": "C3", "program": "py-const:src/config.py::TIMEOUT::30"}],
+            },
+            {
+                "id": "login-route",
+                "text": "Login is served at /v1/login.",
+                "kind": "reference",
+                "load_bearing": True,
+                "checkers": [{"type": "C3", "program": "code:src/routes.py::/v1/login"}],
+            },
+        ]
+    }
+    cp = fixture_repo / "claims.json"
+    cp.write_text(json.dumps(claims), encoding="utf-8")
+    base = git(fixture_repo, "rev-parse", "HEAD")
+
+    assert (
+        cli.main(["--repo", str(fixture_repo), "verify", "docs/design.md", "--claims", str(cp)])
+        == 0
+    )
+    assert (fixture_repo / "docs/design.md.warrant").is_file()
+
+    apply_three_change_commit(fixture_repo)
+    rc = cli.main(["--repo", str(fixture_repo), "revalidate", "--since", base])
+    assert rc == cli.EXIT_REVOKED  # a load-bearing claim broke -> exit 4
diff --git a/tests/test_semantic_context.py b/tests/test_semantic_context.py
new file mode 100644
index 0000000..d045988
--- /dev/null
+++ b/tests/test_semantic_context.py
@@ -0,0 +1,106 @@
+"""C3 `code:` — regex over comment/docstring-stripped Python.
+
+`string:`/`regex:` search raw file text, so a fact that survives only in a comment,
+docstring, or dead literal passes. `code:` parses the file and blanks comments and
+docstrings before matching, so the SAME fact in actual code (an assignment, a call
+arg, a dict-key route string, a decorator) still matches, but a comment/docstring
+survival does not. It is Python-only (the comment/docstring model is Python's);
+a non-Python target ERRORs (cannot run), never silently passes.
+
+These pin the WP4 acceptance matrix and the contrast against raw `string:`/`regex:`.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from dorian.checkers.base import CheckContext, Verdict, run_checker
+from dorian.model import CheckerSpec, Claim
+
+
+def _run(repo: Path, program: str) -> object:
+    claim = Claim(
+        id="c",
+        text="x",
+        kind="reference",
+        load_bearing=False,
+        checkers=(CheckerSpec(type="C3", program=program),),
+    )
+    return run_checker(CheckContext(repo=repo, claim=claim), 0)
+
+
+def _w(repo: Path, rel: str, content: str) -> None:
+    p = repo / rel
+    p.parent.mkdir(parents=True, exist_ok=True)
+    p.write_text(content, encoding="utf-8")
+
+
+def test_code_matches_real_assignment(tmp_path: Path) -> None:
+    _w(tmp_path, "config.py", "TIMEOUT = 30\n")
+    assert _run(tmp_path, r"code:config.py::TIMEOUT\s*=\s*30").verdict is Verdict.PASS
+
+
+def test_code_formatting_tolerant(tmp_path: Path) -> None:
+    _w(tmp_path, "config.py", "TIMEOUT=30\n")
+    assert _run(tmp_path, r"code:config.py::TIMEOUT\s*=\s*30").verdict is Verdict.PASS
+
+
+def test_code_ignores_comment_survival(tmp_path: Path) -> None:
+    """The fact lives ONLY in a comment: raw string:/regex: pass, code: FAILs."""
+    src = "# TIMEOUT = 30 (old default)\nTIMEOUT = 10\n"
+    _w(tmp_path, "config.py", src)
+    # raw text search still sees the comment -> PASS (the false-pass class)
+    assert _run(tmp_path, r"regex:config.py::TIMEOUT\s*=\s*30").verdict is Verdict.PASS
+    # code: ignores the comment -> FAIL (no real assignment of 30)
+    res = _run(tmp_path, r"code:config.py::TIMEOUT\s*=\s*30")
+    assert res.verdict is Verdict.FAIL
+    assert "code_missing" in res.detail
+
+
+def test_code_ignores_docstring_survival(tmp_path: Path) -> None:
+    src = '"""The default TIMEOUT = 30 historically."""\nTIMEOUT = 10\n'
+    _w(tmp_path, "config.py", src)
+    assert _run(tmp_path, r"string:config.py::TIMEOUT = 30").verdict is Verdict.PASS  # raw
+    assert _run(tmp_path, r"code:config.py::TIMEOUT\s*=\s*30").verdict is Verdict.FAIL  # semantic
+
+
+def test_code_keeps_real_string_literals_route(tmp_path: Path) -> None:
+    """A route path lives inside a real string literal (dict key) — kept, not a
+    docstring. code: must still find it."""
+    _w(tmp_path, "routes.py", 'ROUTES = {\n    "/v1/login": "auth.login",\n}\n')
+    assert _run(tmp_path, "code:routes.py::/v1/login").verdict is Verdict.PASS
+
+
+def test_code_keeps_decorator_argument(tmp_path: Path) -> None:
+    _w(tmp_path, "api.py", '@app.route("/health")\ndef health():\n    return "ok"\n')
+    assert _run(tmp_path, "code:api.py::/health").verdict is Verdict.PASS
+
+
+def test_code_keeps_call_argument(tmp_path: Path) -> None:
+    _w(tmp_path, "api.py", "connect(timeout=30, retries=3)\n")
+    assert _run(tmp_path, r"code:api.py::timeout\s*=\s*30").verdict is Verdict.PASS
+
+
+def test_code_non_python_target_is_error(tmp_path: Path) -> None:
+    _w(tmp_path, "notes.txt", "this is not python: def x(::\n")
+    res = _run(tmp_path, "code:notes.txt::def x")
+    assert res.verdict is Verdict.ERROR
+    assert "unparseable" in res.detail
+
+
+def test_code_missing_file_is_fail(tmp_path: Path) -> None:
+    assert _run(tmp_path, "code:gone.py::anything").verdict is Verdict.FAIL
+
+
+def test_code_over_length_pattern_is_error(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", "x = 1\n")
+    assert _run(tmp_path, "code:m.py::" + ("a" * 600)).verdict is Verdict.ERROR
+
+
+def test_code_bad_regex_is_error(tmp_path: Path) -> None:
+    _w(tmp_path, "m.py", "x = 1\n")
+    assert _run(tmp_path, "code:m.py::(unclosed").verdict is Verdict.ERROR
+
+
+def test_code_path_escape_is_error(tmp_path: Path) -> None:
+    assert _run(tmp_path, "code:../../etc/passwd::root").verdict is Verdict.ERROR
diff --git a/tests/test_strength.py b/tests/test_strength.py
new file mode 100644
index 0000000..4977199
--- /dev/null
+++ b/tests/test_strength.py
@@ -0,0 +1,219 @@
+"""Checker-strength and claim-risk diagnostics (advisory; no execution).
+
+Pins the WP2 acceptance matrix: a deterministic classification of each checker's
+TRUTH strength (distinct from trigger coverage), kind-vs-strength adequacy
+mismatches, an advisory C4 test-adequacy lint (WP6: zero/constant assertions), and
+a per-claim risk level. Everything here is advisory — it never runs a checker,
+changes a verdict/trust state, or moves an exit code.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from dorian import strength
+from dorian.model import CheckerSpec, Claim
+
+
+def _claim(kind: str, load_bearing: bool, *programs: tuple[str, str]) -> Claim:
+    return Claim(
+        id="c",
+        text="x",
+        kind=kind,
+        load_bearing=load_bearing,
+        checkers=tuple(CheckerSpec(type=t, program=p) for t, p in programs),
+    )
+
+
+# --- checker strength classification ------------------------------------------
+
+
+def test_checker_strength_classification() -> None:
+    f = strength.checker_strength
+    assert f(CheckerSpec(type="C3", program="path:src/x.py")) == "existence"
+    assert f(CheckerSpec(type="C3", program="symbol:src/x.py::F")) == "existence"
+    assert f(CheckerSpec(type="C3", program="string:src/x.py::lit")) == "raw_text"
+    assert f(CheckerSpec(type="C3", program="regex:src/x.py::p")) == "raw_text"
+    assert f(CheckerSpec(type="C3", program="code:src/x.py::p")) == "semantic_text"
+    assert f(CheckerSpec(type="C3", program="py-signature:src/x.py::F::a")) == "structural"
+    assert f(CheckerSpec(type="C3", program="py-const:src/x.py::K::1")) == "structural"
+    assert f(CheckerSpec(type="C4", program="pytest:t.py::test_a")) == "behavioral"
+    assert f(CheckerSpec(type="C5", program="rowcount:d.csv::>0")) == "data"
+    assert f(CheckerSpec(type="C5", program="snapshot:d.csv")) == "snapshot"
+    assert f(CheckerSpec(type="C5", program="shell:grep x f")) == "shell_executable"
+    assert f(CheckerSpec(type="C1", program="rs-0")) == "snapshot"
+
+
+def test_claim_strength_is_the_strongest_backing(tmp_path: Path) -> None:
+    c = _claim("behavior", True, ("C3", "symbol:a.py::F"), ("C4", "pytest:t.py::test_a"))
+    assert strength.claim_strength(c) == "behavioral"  # the C4 dominates the symbol
+    assert strength.claim_strength(_claim("fact", True)) == "unbacked"
+
+
+# --- adequacy mismatch (kind vs strength) -------------------------------------
+
+
+def test_behavior_claim_backed_only_by_symbol_warns(tmp_path: Path) -> None:
+    (rec,) = strength.analyze(tmp_path, [_claim("behavior", True, ("C3", "symbol:a.py::F"))])
+    assert rec["strength"] == "existence"
+    assert any("adequacy_mismatch" in n for n in rec["adequacy"])
+    assert rec["risk"] == "high"
+
+
+def test_behavior_claim_backed_only_by_regex_warns(tmp_path: Path) -> None:
+    (rec,) = strength.analyze(tmp_path, [_claim("behavior", True, ("C3", "regex:a.py::F\\("))])
+    assert any("adequacy_mismatch" in n for n in rec["adequacy"])  # raw_text is weak for behavior
+
+
+def test_behavior_claim_backed_by_pytest_has_no_mismatch(tmp_path: Path) -> None:
+    (rec,) = strength.analyze(tmp_path, [_claim("behavior", True, ("C4", "pytest:t.py::test_a"))])
+    # the test file does not exist here, so the C4 adequacy lint stays silent (the
+    # checker itself reports test_gone) — and there is no kind/strength mismatch
+    assert not any("adequacy_mismatch" in n for n in rec["adequacy"])
+    assert rec["strength"] == "behavioral"
+
+
+def test_quantity_claim_with_value_checker_has_no_mismatch(tmp_path: Path) -> None:
+    (rec,) = strength.analyze(tmp_path, [_claim("quantity", True, ("C3", "py-const:c.py::T::30"))])
+    assert rec["adequacy"] == []
+    assert rec["risk"] == "low"
+
+
+def test_quantity_claim_backed_only_by_existence_warns(tmp_path: Path) -> None:
+    (rec,) = strength.analyze(tmp_path, [_claim("quantity", True, ("C3", "symbol:c.py::T"))])
+    assert any("adequacy_mismatch" in n for n in rec["adequacy"])
+
+
+def test_data_claim_with_typed_c5_has_no_mismatch(tmp_path: Path) -> None:
+    (rec,) = strength.analyze(tmp_path, [_claim("quantity", True, ("C5", "rowcount:d.csv::>0"))])
+    assert rec["adequacy"] == []
+
+
+# --- unbacked risk ------------------------------------------------------------
+
+
+def test_unbacked_load_bearing_is_high_risk(tmp_path: Path) -> None:
+    (rec,) = strength.analyze(tmp_path, [_claim("fact", True)])
+    assert rec["risk"] == "high"
+    assert "unbacked" in rec["reasons"]
+
+
+def test_unbacked_non_load_bearing_is_low(tmp_path: Path) -> None:
+    (rec,) = strength.analyze(tmp_path, [_claim("fact", False)])
+    assert rec["risk"] in ("low", "medium")  # advisory, not high for a non-load-bearing claim
+    assert rec["risk"] != "high"
+
+
+# --- C4 test adequacy lint (WP6) ----------------------------------------------
+
+
+def _w(repo: Path, rel: str, content: str) -> None:
+    p = repo / rel
+    p.parent.mkdir(parents=True, exist_ok=True)
+    p.write_text(content, encoding="utf-8")
+
+
+def test_c4_zero_assertion_test_warns(tmp_path: Path) -> None:
+    _w(tmp_path, "t.py", "def test_a():\n    do_something()\n")
+    notes = strength.c4_adequacy(tmp_path, CheckerSpec(type="C4", program="pytest:t.py::test_a"))
+    assert any("no assertion" in n.lower() for n in notes)
+
+
+def test_c4_assert_constant_warns(tmp_path: Path) -> None:
+    _w(tmp_path, "t.py", "def test_a():\n    assert True\n")
+    notes = strength.c4_adequacy(tmp_path, CheckerSpec(type="C4", program="pytest:t.py::test_a"))
+    assert any("constant" in n.lower() for n in notes)
+
+
+def test_c4_normal_asserting_test_is_silent(tmp_path: Path) -> None:
+    _w(tmp_path, "t.py", "def test_a():\n    x = f()\n    assert x == 42\n")
+    notes = strength.c4_adequacy(tmp_path, CheckerSpec(type="C4", program="pytest:t.py::test_a"))
+    assert notes == []
+
+
+def test_c4_assertion_helper_is_not_flagged(tmp_path: Path) -> None:
+    """unittest-style assertion methods and pytest.raises count as assertions."""
+    _w(tmp_path, "t.py", "class T:\n    def test_a(self):\n        self.assertEqual(f(), 42)\n")
+    notes = strength.c4_adequacy(tmp_path, CheckerSpec(type="C4", program="pytest:t.py::T::test_a"))
+    assert notes == []
+    _w(
+        tmp_path,
+        "t2.py",
+        "import pytest\ndef test_b():\n    with pytest.raises(ValueError):\n        f()\n",
+    )
+    notes = strength.c4_adequacy(tmp_path, CheckerSpec(type="C4", program="pytest:t2.py::test_b"))
+    assert notes == []
+
+
+def test_c4_missing_or_unfound_node_is_silent(tmp_path: Path) -> None:
+    """The checker itself reports test_gone; the linter does not guess."""
+    assert strength.c4_adequacy(tmp_path, CheckerSpec(type="C4", program="pytest:gone.py::t")) == []
+    _w(tmp_path, "t.py", "def test_a():\n    assert 1\n")
+    assert (
+        strength.c4_adequacy(tmp_path, CheckerSpec(type="C4", program="pytest:t.py::test_z")) == []
+    )
+
+
+def test_c4_adequacy_surfaces_in_behavior_claim(tmp_path: Path) -> None:
+    _w(tmp_path, "t.py", "def test_a():\n    f()\n")
+    (rec,) = strength.analyze(tmp_path, [_claim("behavior", True, ("C4", "pytest:t.py::test_a"))])
+    assert any("assertion" in n.lower() for n in rec["adequacy"])
+
+
+# --- executes field + determinism ---------------------------------------------
+
+
+def test_executes_field(tmp_path: Path) -> None:
+    (rec,) = strength.analyze(
+        tmp_path, [_claim("behavior", True, ("C4", "pytest:t.py::t"), ("C5", "shell:echo hi"))]
+    )
+    assert set(rec["executes"]) == {"pytest", "shell"}
+
+
+def test_analyze_is_deterministic(tmp_path: Path) -> None:
+    claims = [
+        _claim("behavior", True, ("C3", "symbol:a.py::F")),
+        _claim("quantity", True, ("C3", "py-const:c.py::T::30")),
+    ]
+    assert strength.analyze(tmp_path, claims) == strength.analyze(tmp_path, claims)
+
+
+# --- CLI surface: `dorian bindings` carries strength (JSON + human) ------------
+
+
+def test_cmd_bindings_surfaces_strength(fixture_repo: Path, capsys) -> None:
+    """A behavior claim backed only by symbol: seals (exit 0) but `dorian bindings`
+    must surface its weak strength, high risk, and adequacy mismatch — JSON + human."""
+    import json
+
+    from dorian import cli
+
+    claims = {
+        "claims": [
+            {
+                "id": "vt-behavior",
+                "text": "verify_token authenticates the request.",
+                "kind": "behavior",
+                "load_bearing": True,
+                "checkers": [{"type": "C3", "program": "symbol:src/auth.py::verify_token"}],
+            }
+        ]
+    }
+    cp = fixture_repo / "claims.json"
+    cp.write_text(json.dumps(claims), encoding="utf-8")
+    assert (
+        cli.main(["--repo", str(fixture_repo), "verify", "docs/design.md", "--claims", str(cp)])
+        == 0
+    )
+    capsys.readouterr()
+
+    assert cli.main(["--repo", str(fixture_repo), "--json", "bindings", "docs/design.md"]) == 0
+    payload = json.loads(capsys.readouterr().out)
+    (diag,) = payload["claims"]
+    assert diag["strength"]["strength"] == "existence"
+    assert diag["strength"]["risk"] == "high"
+    assert any("adequacy_mismatch" in n for n in diag["strength"]["adequacy"])
+
+    assert cli.main(["--repo", str(fixture_repo), "bindings", "docs/design.md"]) == 0
+    out = capsys.readouterr().out
+    assert "strength: existence" in out and "risk: high" in out

From 6a8298c5ef2deeb1e8fd3b70bb399706201caf28 Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 17:40:21 +0530
Subject: [PATCH 02/13] feat(v1): trusted-base checker-source mode for
 public/fork PR safety (WP7)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

revalidate --checker-source {head,base} (default head) + Action checker_trust input.
base mode resolves each candidate claim's checker SPEC from the --since (base) ref and
runs it against PR-head sources, so a PR-added or PR-modified executable (C4/C5 shell)
checker is never executed and a rewritten checker cannot self-attest a verdict (base
spec wins; the change is surfaced as a trust-root note). Fail-closed: a missing/tampered
base sidecar ERRORs (never executed), never BROKEN, never green. Composes with
deny-exec. NOT a sandbox — a base-approved pytest checker can still run head code, stated
in every surface.

- revalidate.py: checker_source param, _load_base_warrant (integrity-checked base
  sidecar via gitio.file_at_ref), RevalResult.notes, text/md rendering of notes.
- cli.py/commands.py: --checker-source flag + DORIAN_CHECKER_SOURCE env fallback;
  base requires --since.
- action.yml: checker_trust input (default head) -> DORIAN_CHECKER_SOURCE; README + Inputs
  table updated (also documents the pre-existing deny_exec/deny_shell inputs).
- docs: TRUSTED_BASE_ACTION_DESIGN status -> IMPLEMENTED; SECURITY_BOUNDARY public-fork
  checklist updated (trust-root conditions met; sandboxing still out of scope).
- tests/test_trusted_base.py: the §6 exploit matrix (10 cases) — each "executed?" case
  proven by a sentinel touch that must NOT appear under base mode.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 V1_IMPLEMENTATION_TRACKER.md           |  12 +-
 action/README.md                       |  47 ++--
 action/action.yml                      |  14 ++
 docs/SECURITY_BOUNDARY.md              |  44 ++--
 docs/TRUSTED_BASE_ACTION_DESIGN.md     |  15 +-
 src/dorian/cli.py                      |  10 +
 src/dorian/commands.py                 |  16 ++
 src/dorian/revalidate.py               |  93 +++++++-
 tests/test_action_security_defaults.py |  12 +
 tests/test_render_md.py                |  11 +-
 tests/test_trusted_base.py             | 292 +++++++++++++++++++++++++
 11 files changed, 519 insertions(+), 47 deletions(-)
 create mode 100644 tests/test_trusted_base.py

diff --git a/V1_IMPLEMENTATION_TRACKER.md b/V1_IMPLEMENTATION_TRACKER.md
index 3cf4254..44db552 100644
--- a/V1_IMPLEMENTATION_TRACKER.md
+++ b/V1_IMPLEMENTATION_TRACKER.md
@@ -100,12 +100,14 @@ Categories: IMPL=must-implement · TEST=must-test regression · DOC=must-documen
 | WP | Title | Status |
 |---|---|---|
 | WP1 | docs/evidence hygiene | TODO |
-| WP2 | checker-strength / claim-risk linter | TODO |
-| WP3 | Python structural checkers (py-signature, py-const) | TODO |
-| WP4 | semantic-context source search | TODO |
+| WP2 | checker-strength / claim-risk linter | DONE (strength.py; surfaced in `bindings` + binding-gate warn; 19 tests) |
+| WP3 | Python structural checkers (py-signature, py-const) | DONE (pyast.py + C3 subgrammars; 27 tests incl. e2e) |
+| WP4 | semantic-context source search (`code:`) | DONE (pyast.code_only_python + C3 `code:`; 12 tests) |
 | WP5 | multi-index binding (config-key) | TODO |
-| WP6 | C4 test-adequacy lint | TODO |
-| WP7 | trusted-base checker-source mode | TODO |
+| WP6 | C4 test-adequacy lint | DONE (strength.c4_adequacy; folded into WP2 tests) |
+| WP7 | trusted-base checker-source mode | DONE (revalidate --checker-source base + Action checker_trust; 10-case exploit matrix) |
 | WP8 | warrant-quality mutation harness | TODO |
 | WP9 | current-version benchmark results | TODO |
 | WP10 | V1 release prep / decision | TODO |
+
+Commits so far: `58b39e2` (WP3/4/2/6), trusted-base (WP7) next.
diff --git a/action/README.md b/action/README.md
index a4eafc4..232cc58 100644
--- a/action/README.md
+++ b/action/README.md
@@ -81,16 +81,30 @@ claim so a broken fact re-verifies). See `SECURITY.md` and
     deny_exec: "true"   # C4/C5 ERROR instead of executing
 ```
 
-**Current recommendation: trusted/internal repositories.** Until a
-trusted-base mode exists (execute only checker specs already present on the
-base branch; parse/lint — never execute — new or changed PR sidecars; skip
-C5 `shell:` and other executable checkers in untrusted mode — designed in
-[`docs/TRUSTED_BASE_ACTION_DESIGN.md`](../docs/TRUSTED_BASE_ACTION_DESIGN.md),
-not yet implemented), this Action is recommended for repositories where
-everyone who can open a PR is already trusted to run code in CI, or with
-`deny_exec: true` for untrusted PRs. For public repositories, treat any PR that
-touches a `.warrant` file as a code change requiring the same review as a CI
-change.
+**trusted-base mode (`checker_trust: base`).** This is the trust-root fix for the
+self-attested-verdict problem. With `checker_trust: base`, the Action resolves each
+claim's checker SPEC from the **base ref** and runs it against the PR-head sources, so
+a PR-added or PR-modified executable checker is never executed and a rewritten checker
+cannot self-attest a verdict — the base-approved spec wins, and the change is surfaced
+in the PR comment. A missing or tampered base sidecar **fails closed** (ERRORED, never
+executed). Implemented and proven by the
+[trusted-base test matrix](../docs/TRUSTED_BASE_ACTION_DESIGN.md).
+
+```yaml
+# public / forked-PR posture: trusted checker specs + no code execution
+- uses: ajaysurya1221/dorian/action@main
+  with:
+    checker_trust: base   # run only base-approved checker specs
+    deny_exec: "true"     # and refuse to execute even those (belt and braces)
+```
+
+**It is a checker-source trust root, not a sandbox.** A base-approved `pytest:` checker
+can still import and execute PR-head code, so for fully untrusted forks combine
+`checker_trust: base` **with** `deny_exec: true` (or external isolation). Default
+`checker_trust: head` is unchanged and correct for trusted/internal repositories, where
+everyone who can open a PR is already trusted to run code in CI. For public repositories,
+treat any PR that touches a `.warrant` file as a code change requiring the same review as
+a CI change.
 
 Hard rules either way:
 
@@ -107,11 +121,14 @@ Hard rules either way:
 
 ## Inputs
 
-| input     | default                                      | meaning                                                                  |
-| --------- | -------------------------------------------- | ------------------------------------------------------------------------ |
-| `fail_on` | `revoked`                                    | when to fail the step: `revoked` (exit 4 only), `degraded` (3 or 4), `never` |
-| `base`    | `${{ github.event.pull_request.base.sha }}`  | git ref passed to `dorian revalidate --since`                            |
-| `install` | `dorian-vwp`                                 | pip spec; pin `dorian-vwp==0.6.*`, or `.` for checkout installs          |
+| input           | default                                      | meaning                                                                  |
+| --------------- | -------------------------------------------- | ------------------------------------------------------------------------ |
+| `fail_on`       | `revoked`                                    | when to fail the step: `revoked` (exit 4 only), `degraded` (3 or 4), `never` |
+| `base`          | `${{ github.event.pull_request.base.sha }}`  | git ref passed to `dorian revalidate --since`                            |
+| `install`       | `dorian-vwp`                                 | pip spec; pin `dorian-vwp==0.6.*`, or `.` for checkout installs          |
+| `deny_exec`     | `false`                                      | refuse to run executable checkers (C4 pytest, C5 shell): they ERROR. For untrusted/fork PRs; fail-closed, not a sandbox |
+| `deny_shell`    | `false`                                      | narrower than `deny_exec`: block only C5 shell, still allow C4 pytest    |
+| `checker_trust` | `head`                                       | `head` runs the checked-out checker spec (trusted repos); `base` runs the base-ref spec so PR-authored executable checkers never run (public/fork PRs) |
 
 Until the first PyPI release of `dorian-vwp`, set `install` to a source spec:
 `install: 'dorian-vwp @ git+https://github.com/ajaysurya1221/dorian.git'`.
diff --git a/action/action.yml b/action/action.yml
index 7683d64..77fabdc 100644
--- a/action/action.yml
+++ b/action/action.yml
@@ -41,6 +41,17 @@ inputs:
       Narrower than deny_exec: block only C5 shell, still allow C4 pytest.
     required: false
     default: "false"
+  checker_trust:
+    description: >-
+      Which sidecar a claim's checker SPEC is read from (the sources checked are
+      always the PR head). 'head' (default) runs the checked-out spec — correct for
+      trusted/internal repos. 'base' resolves each spec from the base ref, so a
+      PR-added or PR-modified executable checker is never executed and a rewritten
+      checker cannot self-attest a verdict — for public repos taking forked PRs.
+      Fail-closed, NOT a sandbox: a base-approved pytest checker can still execute
+      PR-head code. See docs/TRUSTED_BASE_ACTION_DESIGN.md.
+    required: false
+    default: head
 
 runs:
   using: composite
@@ -64,6 +75,9 @@ runs:
         # unchanged. Set deny_exec: true for untrusted/fork PRs.
         DORIAN_DENY_EXEC: ${{ inputs.deny_exec }}
         DORIAN_DENY_SHELL: ${{ inputs.deny_shell }}
+        # checker_trust=base resolves checker specs from the base ref so a
+        # PR-authored executable checker never runs. 'head' (default) is unchanged.
+        DORIAN_CHECKER_SOURCE: ${{ inputs.checker_trust }}
       run: |
         set +e
         dorian sync
diff --git a/docs/SECURITY_BOUNDARY.md b/docs/SECURITY_BOUNDARY.md
index 6cff620..28e2bb3 100644
--- a/docs/SECURITY_BOUNDARY.md
+++ b/docs/SECURITY_BOUNDARY.md
@@ -58,10 +58,8 @@ dorian revalidate --since HEAD
 
 Do **not** run `verify` / `seal` / `revalidate` / `rebind` on claims from a
 source you do not trust without `--deny-exec` (`rebind` re-runs every checker to
-re-seal, so it executes code too). Do **not** market or wire up public-fork-PR CI
-as safe: the trusted-base design ([docs/TRUSTED_BASE_ACTION_DESIGN.md](TRUSTED_BASE_ACTION_DESIGN.md))
-that would make it safe is not implemented or tested yet. Do not use
-`pull_request_target` with an untrusted-head checkout.
+re-seal, so it executes code too). Do not use `pull_request_target` with an
+untrusted-head checkout.
 
 ```bash
 # untrusted context: remove the ability to execute code
@@ -72,14 +70,30 @@ DORIAN_DENY_EXEC=1 dorian revalidate --since origin/main
 A blocked checker ERRORs, so a blocked load-bearing claim cannot seal and cannot
 silently pass revalidation — deny-exec fails closed.
 
-## What must be true before public-fork CI can be recommended
-
-1. Checker programs are taken from the **trusted base ref**, never from untrusted
-   head, unless explicitly allowlisted.
-2. deny-exec (or stronger) is the **default** for fork PRs.
-3. There are tests that simulate a fork/head sidecar trying to execute shell and
-   prove the Action blocks it.
-4. No `pull_request_target` footgun in the documented workflow.
-
-Until all four hold, the honest statement is: **trusted/internal repositories,
-or `--deny-exec` everywhere else.**
+## Public-fork CI: `--checker-source base` (a trust root, not a sandbox)
+
+`dorian revalidate --checker-source base` (Action input `checker_trust: base`)
+resolves each claim's checker SPEC from the trusted **base ref**, then runs it
+against the PR-head sources. A PR-added or PR-modified executable (C4/C5 `shell:`)
+checker is therefore never executed; a rewritten checker cannot self-attest a
+verdict (the base spec wins, and the change is surfaced); a missing or tampered
+base sidecar **fails closed** (ERRORED, never executed). The
+[trusted-base test matrix](TRUSTED_BASE_ACTION_DESIGN.md) proves each case with a
+filesystem side effect that must not appear.
+
+This is a **checker-source trust root, not a sandbox.** A base-approved `pytest:`
+checker can still import and execute PR-head code, so the honest recommendation for
+public forks is `checker_trust: base` **with `deny_exec: true`** (or stronger
+external isolation), never "safe for arbitrary fork PRs". The four conditions below
+now hold for the *trust-root* threat; sandboxing executed code remains out of scope.
+
+1. ✅ Checker specs are taken from the **trusted base ref** (`--checker-source base`).
+2. ✅ deny-exec is available and recommended for fork PRs (`deny_exec: true`).
+3. ✅ Tests simulate a fork/head sidecar trying to execute shell/pytest and prove it
+   is not run (`tests/test_trusted_base.py`).
+4. ✅ No `pull_request_target` in the documented workflow.
+
+The residual, stated plainly: even in base mode a base-approved code-executing
+checker runs PR-head code, so **without deny-exec or external sandboxing this is
+not safe for fully untrusted code.** For trusted/internal repos, `head` mode
+remains correct and unchanged.
diff --git a/docs/TRUSTED_BASE_ACTION_DESIGN.md b/docs/TRUSTED_BASE_ACTION_DESIGN.md
index c1a3446..dde610b 100644
--- a/docs/TRUSTED_BASE_ACTION_DESIGN.md
+++ b/docs/TRUSTED_BASE_ACTION_DESIGN.md
@@ -1,8 +1,13 @@
-# Trusted-base Action mode — design (not implemented)
-
-> **HUMAN REVIEW REQUIRED.** This is a design only. It changes the Action's security model and adds
-> checker-execution gating — it must not be implemented without explicit human sign-off and the test
-> matrix in §6. No code in this repo implements it yet.
+# Trusted-base Action mode — design + status
+
+> **STATUS: IMPLEMENTED (V1).** `dorian revalidate --checker-source {head,base}` and the Action
+> `checker_trust: head|base` input now implement this design. Default is `head` (today's behavior,
+> unchanged). The §6 test matrix is implemented in `tests/test_trusted_base.py` (PR-added/modified
+> executable checkers never execute — proven with a sentinel side effect; missing/tampered base
+> sidecar fails closed; a rewritten checker is surfaced as a trust-root change; deny-exec composes).
+> The non-sandbox caveat in §2/§7 still holds and is stated in user docs: a base-approved
+> `pytest:`/`shell:` checker can still execute PR-head code, so `base` mode is a *checker-source trust
+> root*, not a sandbox.
 
 ## 1. Problem
 
diff --git a/src/dorian/cli.py b/src/dorian/cli.py
index 0ddd756..e257fac 100644
--- a/src/dorian/cli.py
+++ b/src/dorian/cli.py
@@ -182,6 +182,16 @@ def build_parser() -> argparse.ArgumentParser:
         help="output format; md is a PR-comment body for the GitHub Action",
     )
     rv.add_argument("--enable-c2lite", action="store_true")
+    rv.add_argument(
+        "--checker-source",
+        choices=["head", "base"],
+        default=None,
+        help="which sidecar a claim's checker SPEC is read from (sources checked are"
+        " always the working tree). 'head' (default) runs the checked-out spec —"
+        " trusted/internal repos. 'base' resolves each spec from the --since (base) ref"
+        " so a PR-added or PR-modified executable checker is never executed — for"
+        " public/fork PRs; fail-closed, NOT a sandbox. Env: DORIAN_CHECKER_SOURCE.",
+    )
     _add_exec_policy_flags(rv)
 
     rp = sub.add_parser("report", help="event-log digest")
diff --git a/src/dorian/commands.py b/src/dorian/commands.py
index dff6848..0e23f72 100644
--- a/src/dorian/commands.py
+++ b/src/dorian/commands.py
@@ -623,6 +623,21 @@ def cmd_revalidate(args: argparse.Namespace) -> int:
         return EXIT_USAGE
     if _missing_repo(repo, "revalidate"):
         return EXIT_USAGE
+    # flag wins; env DORIAN_CHECKER_SOURCE is the Action's fallback (head|base)
+    checker_source = args.checker_source or os.environ.get("DORIAN_CHECKER_SOURCE", "head").strip()
+    if checker_source not in ("head", "base"):
+        print(
+            f"dorian revalidate: --checker-source must be head|base (got {checker_source!r})",
+            file=sys.stderr,
+        )
+        return EXIT_USAGE
+    if checker_source == "base" and args.since is None:
+        print(
+            "dorian revalidate: --checker-source base requires --since <base ref>"
+            " (the trusted checker spec is read from the base ref)",
+            file=sys.stderr,
+        )
+        return EXIT_USAGE
     try:
         result = revalidate(
             repo,
@@ -632,6 +647,7 @@ def cmd_revalidate(args: argparse.Namespace) -> int:
             policy=ExecutionPolicy.from_flags_and_env(
                 deny_exec=args.deny_exec, deny_shell=args.deny_shell
             ),
+            checker_source=checker_source,
         )
     # user-input failures, before the broader ValueError in _SIDECAR_ERRORS:
     # an unresolvable --since ref or an unreadable --changed-paths listing
diff --git a/src/dorian/revalidate.py b/src/dorian/revalidate.py
index f60b953..55fab14 100644
--- a/src/dorian/revalidate.py
+++ b/src/dorian/revalidate.py
@@ -21,7 +21,7 @@
 
 import json
 import sqlite3
-from dataclasses import asdict, dataclass, field
+from dataclasses import asdict, dataclass, field, replace
 from datetime import UTC, datetime
 from pathlib import Path
 
@@ -58,6 +58,9 @@ class RevalResult:
     # claim/trust states are untouched): {warrant_id, artifact_uri, depth, via}
     # where via is the newly broken upstream warrant
     recalled: list[dict] = field(default_factory=list)
+    # checker-source=base advisories: a checker spec that changed on the PR (so the
+    # base-approved spec was run instead), or a claim/sidecar skipped fail-closed
+    notes: list[str] = field(default_factory=list)
     candidates: int = 0
     exit_code: int = 0
 
@@ -69,17 +72,36 @@ def revalidate(
     changed_paths_file: Path | None = None,
     enable_c2lite: bool = False,
     policy: ExecutionPolicy | None = None,
+    checker_source: str = "head",
 ) -> RevalResult:
     """Re-check claims bound to the changed paths; one of `since` (git ref to
     diff from) or `changed_paths_file` (one path per line) is required. If both
     are given, `changed_paths_file` takes precedence and `since` is ignored
-    (the CLI rejects the combination)."""
+    (the CLI rejects the combination).
+
+    checker_source (head | base; default head) selects which sidecar a candidate
+    claim's checker SPEC is read from — orthogonal to which SOURCES are checked
+    (always the working tree / PR head). `head` is today's behavior exactly. `base`
+    is the public/fork-PR hardening: each claim's checker spec is resolved from the
+    `since` (base) ref's sidecar, so a PR-added or PR-modified executable checker is
+    never executed — only maintainer-approved (base) checker specs run. It fails
+    closed (a missing/tampered base sidecar, or a claim absent on base, ERRORs and
+    runs nothing) and it is NOT a sandbox: a base-approved C4 `pytest:` checker can
+    still import and execute PR-head code (see docs/TRUSTED_BASE_ACTION_DESIGN.md)."""
     if since is None and changed_paths_file is None:
         raise ValueError("provide since=<git ref> or changed_paths_file=<path>")
+    if checker_source not in ("head", "base"):
+        raise ValueError(f"checker_source must be 'head' or 'base', got {checker_source!r}")
+    if checker_source == "base" and since is None:
+        raise ValueError(
+            "checker-source=base needs --since <base ref>: the trusted checker spec is"
+            " resolved from the base ref, which --changed-paths does not provide"
+        )
     repo = repo.resolve()
     # under deny-exec/deny-shell a blocked C4/C5-shell recheck ERRORs (exit 5),
     # never silently PASSes and never folds to BROKEN — trigger-vs-truth intact
     exec_policy = policy if policy is not None else ExecutionPolicy()
+    base_cache: dict[str, Warrant | None] = {}  # checker-source=base: per-artifact base sidecar
     if changed_paths_file is not None:
         # read exactly once, before any store work: a failure here is bad caller
         # input (distinct ChangedPathsError), never a sidecar integrity error
@@ -137,11 +159,42 @@ def revalidate(
                     kind="claim.stale",
                     cause={"changed": cause},
                 )
-                if not claim.checkers:
+                if not claim.checkers and checker_source != "base":
                     continue  # unbacked claim: stale is recorded, nothing to re-check
-                state, detail, relocated = _check_claim(
-                    repo, claim, entries, renames, enable_c2lite, exec_policy
-                )
+                # checker-source=base: run the BASE-approved checker spec (resolved from
+                # the `since` ref) against head sources, never the PR's spec. Fail closed
+                # (ERRORED, never executed) when the base spec cannot be trusted.
+                eff_claim = claim
+                skip_reason: str | None = None
+                if checker_source == "base":
+                    base_w = _load_base_warrant(repo, since, warrant.artifact_uri, base_cache)
+                    if base_w is None:
+                        skip_reason = (
+                            "checker-source=base: no readable base sidecar for this artifact"
+                            " (fail-closed; not executed)"
+                        )
+                    else:
+                        base_claim = next((c for c in base_w.claims if c.id == cid), None)
+                        if base_claim is None:
+                            skip_reason = (
+                                "checker-source=base: claim not present on base ref"
+                                " (PR-added checker; not executed)"
+                            )
+                        else:
+                            if base_claim.checkers != claim.checkers:
+                                result.notes.append(
+                                    f"{warrant.artifact_uri}: {cid}: checker spec changed on PR"
+                                    " — ran base-approved spec (checker-source=base)"
+                                )
+                            eff_claim = replace(claim, checkers=base_claim.checkers)
+                if skip_reason is not None:
+                    state, detail, relocated = "ERRORED", skip_reason, False
+                elif not eff_claim.checkers:
+                    continue  # nothing to run (head unbacked, or base claim unbacked)
+                else:
+                    state, detail, relocated = _check_claim(
+                        repo, eff_claim, entries, renames, enable_c2lite, exec_policy
+                    )
                 changed_state = fold_mod.apply_claim_state(
                     conn, wid, cid, state, actor=actor, cause={"detail": detail}
                 )
@@ -187,6 +240,28 @@ def revalidate(
         conn.close()
 
 
+def _load_base_warrant(
+    repo: Path, base_ref: str, artifact_uri: str, cache: dict[str, Warrant | None]
+) -> Warrant | None:
+    """The artifact's sidecar AS IT EXISTS ON THE BASE REF (checker-source=base), or
+    None if it is absent, unreadable, or its content-addressed id does not verify (a
+    tampered base sidecar). Fail-closed by construction: None makes the caller skip,
+    never execute the PR's checker. Cached per artifact for the run."""
+    if artifact_uri in cache:
+        return cache[artifact_uri]
+    warrant: Warrant | None = None
+    data = gitio.file_at_ref(repo, base_ref, artifact_uri + ".warrant")
+    if data is not None:
+        try:
+            candidate = Warrant.from_dict(json.loads(data.decode("utf-8")))
+            if Warrant.compute_id(candidate.body_dict()) == candidate.id:
+                warrant = candidate  # integrity-valid base sidecar
+        except (ValueError, KeyError, TypeError, UnicodeDecodeError):
+            warrant = None  # malformed/tampered base sidecar: fail closed
+    cache[artifact_uri] = warrant
+    return warrant
+
+
 def _claim_paths(
     claim: Claim, entries: dict[str, ReadSetEntry], renames: dict[str, str]
 ) -> set[str]:
@@ -281,6 +356,8 @@ def render_text(result: RevalResult) -> str:
         for e in result.recalled:
             wid, uri = e["warrant_id"], e["artifact_uri"]
             lines.append(f"recalled  {wid[:23]} {uri}  depth={e['depth']}")
+    for note in result.notes:
+        lines.append(f"note      {note}")
     return "\n".join(lines) + "\n"
 
 
@@ -358,6 +435,10 @@ def render_md(result: RevalResult) -> str:
         lines += ["", "Recalled downstream (flagged, not re-checked):"]
         for e in result.recalled:
             lines.append(f"- `{e['artifact_uri']}` (depth {e['depth']})")
+    if result.notes:  # checker-source=base advisories (PR-changed / skipped specs)
+        lines += ["", "Checker-source notes (trusted-base mode):"]
+        for note in result.notes:
+            lines.append(f"- {_md_cell(note)}")
 
     checks = sum(map(len, (result.broken, result.relocated, result.errored, result.passed)))
     meaning = _EXIT_MEANINGS.get(result.exit_code, "unknown")
diff --git a/tests/test_action_security_defaults.py b/tests/test_action_security_defaults.py
index 7f3b47a..2bee860 100644
--- a/tests/test_action_security_defaults.py
+++ b/tests/test_action_security_defaults.py
@@ -35,6 +35,18 @@ def test_action_wires_deny_flags_through_env_fallback() -> None:
     assert "DORIAN_DENY_SHELL: ${{ inputs.deny_shell }}" in text
 
 
+def test_action_exposes_checker_trust_defaulting_head() -> None:
+    inputs = _action()["inputs"]
+    assert "checker_trust" in inputs
+    # default head = today's behavior; trusted repos are unchanged unless they opt in
+    assert str(inputs["checker_trust"]["default"]) == "head"
+
+
+def test_action_wires_checker_trust_through_env() -> None:
+    text = ACTION_YML.read_text(encoding="utf-8")
+    assert "DORIAN_CHECKER_SOURCE: ${{ inputs.checker_trust }}" in text
+
+
 def test_action_does_not_recommend_pull_request_target() -> None:
     readme = ACTION_README.read_text(encoding="utf-8")
     # it is named only to forbid it
diff --git a/tests/test_render_md.py b/tests/test_render_md.py
index 4667562..1a6be12 100644
--- a/tests/test_render_md.py
+++ b/tests/test_render_md.py
@@ -246,13 +246,22 @@ def test_action_yml_is_valid_composite() -> None:
 
     assert data["runs"]["using"] == "composite"
     inputs = data["inputs"]
-    assert set(inputs) == {"fail_on", "base", "install", "deny_exec", "deny_shell"}
+    assert set(inputs) == {
+        "fail_on",
+        "base",
+        "install",
+        "deny_exec",
+        "deny_shell",
+        "checker_trust",
+    }
     assert inputs["fail_on"]["default"] == "revoked"
     assert inputs["base"]["default"] == "${{ github.event.pull_request.base.sha }}"
     assert inputs["install"]["default"] == "dorian-vwp"
     # deny-exec/deny-shell default OFF so trusted/internal repos are unchanged
     assert str(inputs["deny_exec"]["default"]) == "false"
     assert str(inputs["deny_shell"]["default"]) == "false"
+    # checker_trust defaults to head (today's behavior); base is the opt-in fork mode
+    assert str(inputs["checker_trust"]["default"]) == "head"
     for name in inputs:
         assert inputs[name]["description"].strip(), f"input {name} must be documented"
 
diff --git a/tests/test_trusted_base.py b/tests/test_trusted_base.py
new file mode 100644
index 0000000..fd8aa08
--- /dev/null
+++ b/tests/test_trusted_base.py
@@ -0,0 +1,292 @@
+"""Trusted-base checker-source mode (WP7): `revalidate --checker-source base`.
+
+The exploit class: the Action runs the checker programs found in the CHECKED-OUT
+(PR/head) `.warrant` sidecars. On a forked PR, the attacker controls those sidecars,
+so a PR-added or PR-modified C4 `pytest:`/C5 `shell:` checker would execute attacker
+code on the runner, and a rewritten non-executable checker could self-attest a green
+verdict. `--checker-source base` resolves every claim's checker SPEC from the trusted
+base ref instead, runs it against the PR-head SOURCES, and fails closed when the base
+spec cannot be trusted.
+
+This is the security matrix from docs/TRUSTED_BASE_ACTION_DESIGN.md §6. Each
+"executed?" case proves it with a filesystem side effect (a sentinel `touch`): the
+sentinel must NOT appear under base mode. It is NOT a sandbox — a base-approved
+pytest checker can still execute head code — which the design and these tests state.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import pytest
+
+from conftest import commit_all, git, write
+from dorian import cli
+from dorian.model import (
+    CheckerSpec,
+    Claim,
+    FoldPolicy,
+    ProducedBy,
+    ReadSet,
+    Warrant,
+    sha256_hex,
+)
+from dorian.policy import ExecutionPolicy
+from dorian.revalidate import revalidate
+from dorian.seal import seal_artifact
+
+PB = ProducedBy(runner="manual", captured_at="2026-01-01T00:00:00Z")
+
+
+def _repo(tmp_path: Path) -> Path:
+    repo = tmp_path / "repo"
+    repo.mkdir()
+    git(repo, "init", "-q", "-b", "main")
+    write(repo, "src/config.py", "TIMEOUT = 30\n")
+    write(repo, "src/auth.py", "def verify_token(t):\n    return bool(t)\n")
+    write(repo, "note.md", "# note\n\nThe timeout is 30.\n")
+    commit_all(repo, "init")  # seal needs a resolvable HEAD
+    return repo
+
+
+def _forge_head_warrant(repo: Path, artifact_uri: str, claims: list[Claim]) -> Warrant:
+    """Write a content-addressed `.warrant` to the working tree directly, simulating a
+    sidecar a forked PR fully controls (a real attacker computes the same valid id)."""
+    data = (repo / artifact_uri).read_bytes()
+    w = Warrant.create(
+        artifact_uri=artifact_uri,
+        artifact_hash=sha256_hex(data),
+        git_ref=git(repo, "rev-parse", "HEAD"),
+        produced_by=PB,
+        read_set=(),
+        claims=tuple(claims),
+        fold_policy=FoldPolicy(),
+        sealed_at="2026-01-01T00:00:00Z",
+    )
+    w.dump(w.sidecar_path(repo))
+    return w
+
+
+def _claim(cid: str, program: str, ctype: str = "C3", *, watch=(), load_bearing=True) -> Claim:
+    return Claim(
+        id=cid,
+        text=f"claim {cid}",
+        kind="reference",
+        load_bearing=load_bearing,
+        checkers=(CheckerSpec(type=ctype, program=program, watch=tuple(watch)),),
+    )
+
+
+# --- matrix #2: a base-unchanged checker still runs and catches real drift ----
+
+
+def test_base_unchanged_checker_catches_drift(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    seal_artifact(
+        repo,
+        "note.md",
+        ReadSet(entries=(), produced_by=PB),
+        [_claim("t", r"regex:src/config.py::TIMEOUT\s*=\s*30")],
+    )
+    base = commit_all(repo, "base: sealed warrant")
+    write(repo, "src/config.py", "TIMEOUT = 10\n")  # head drift
+    res = revalidate(repo, since=base, checker_source="base")
+    assert {cid for _, cid, _ in res.broken} == {"t"}  # base spec ran, caught the drift
+    assert res.exit_code == cli.EXIT_REVOKED
+
+
+# --- matrix #3: a PR-ADDED executable checker is never executed ---------------
+
+
+def test_pr_added_shell_checker_does_not_execute(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    # base warrant carries only a benign non-executing claim
+    seal_artifact(
+        repo, "note.md", ReadSet(entries=(), produced_by=PB), [_claim("ok", "path:src/config.py")]
+    )
+    base = commit_all(repo, "base")
+
+    base_sentinel = tmp_path / "BASE_PWNED"
+    head_sentinel = tmp_path / "HEAD_PWNED"
+    write(repo, "src/auth.py", "def verify_token(t):\n    return t\n")  # head source change
+
+    def forge(sentinel: Path) -> None:
+        _forge_head_warrant(
+            repo,
+            "note.md",
+            [
+                _claim("ok", "path:src/config.py"),
+                _claim("evil", f"shell:touch {sentinel}", "C5", watch=("src/auth.py",)),
+            ],
+        )
+
+    # base mode: the PR-added 'evil' claim is absent on base -> skipped, never executed
+    forge(base_sentinel)
+    res = revalidate(repo, since=base, checker_source="base")
+    assert not base_sentinel.exists(), "PR-added shell checker MUST NOT execute under base mode"
+    assert "evil" in {cid for _, cid, _ in res.errored}
+
+    # head mode (the unsafe default for forks): proves the checker is genuinely live
+    forge(head_sentinel)
+    revalidate(repo, since=base, checker_source="head")
+    assert head_sentinel.exists(), "sanity: head mode does run the PR's shell checker"
+
+
+# --- matrix #4: a PR-MODIFIED executable checker is never executed ------------
+
+
+def test_pr_modified_executable_checker_uses_base_spec(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    seal_artifact(
+        repo,
+        "note.md",
+        ReadSet(entries=(), produced_by=PB),
+        [_claim("c", r"regex:src/config.py::TIMEOUT\s*=\s*30", watch=("src/config.py",))],
+    )
+    base = commit_all(repo, "base: regex checker")
+
+    sentinel = tmp_path / "MOD_PWNED"
+    write(repo, "src/config.py", "TIMEOUT = 10\n")  # base regex would FAIL on this
+    write(repo, "src/auth.py", "def verify_token(t):\n    return t\n")
+    # PR rewrites claim 'c' from the base regex to a shell command
+    _forge_head_warrant(
+        repo, "note.md", [_claim("c", f"shell:touch {sentinel}", "C5", watch=("src/auth.py",))]
+    )
+
+    res = revalidate(repo, since=base, checker_source="base")
+    assert not sentinel.exists(), "PR-modified executable checker MUST NOT execute"
+    assert {cid for _, cid, _ in res.broken} == {"c"}  # the BASE regex ran (and failed)
+    assert any("changed on PR" in n for n in res.notes)  # trust-root change surfaced
+
+
+# --- trust-root change: a PR-weakened NON-executable checker -----------------
+
+
+def test_pr_weakened_nonexec_checker_is_reported_and_base_spec_wins(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    seal_artifact(
+        repo,
+        "note.md",
+        ReadSet(entries=(), produced_by=PB),
+        [_claim("c", r"regex:src/config.py::TIMEOUT\s*=\s*30", watch=("src/config.py",))],
+    )
+    base = commit_all(repo, "base: strict regex")
+    write(repo, "src/config.py", "TIMEOUT = 10\n")  # the fact is now false
+    # PR weakens the checker to a mere existence check that always passes
+    _forge_head_warrant(
+        repo, "note.md", [_claim("c", "path:src/config.py", watch=("src/config.py",))]
+    )
+
+    # head mode: the weakening attack succeeds — the existence check passes
+    head = revalidate(repo, since=base, checker_source="head")
+    assert not head.broken and {cid for _, cid, _ in head.passed} == {"c"}
+
+    # base mode: the base regex spec wins -> BROKEN, and the spec change is surfaced
+    res = revalidate(repo, since=base, checker_source="base")
+    assert {cid for _, cid, _ in res.broken} == {"c"}
+    assert any("changed on PR" in n for n in res.notes)
+
+
+# --- matrix #6: a missing/unreadable base sidecar fails closed ----------------
+
+
+def test_missing_base_sidecar_fails_closed(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    base = git(repo, "rev-parse", "HEAD")  # base has the source but NO warrant for note.md
+    write(repo, "src/auth.py", "def verify_token(t):\n    return t\n")
+    sentinel = tmp_path / "NOBASE_PWNED"
+    _forge_head_warrant(
+        repo, "note.md", [_claim("c", f"shell:touch {sentinel}", "C5", watch=("src/auth.py",))]
+    )
+    res = revalidate(repo, since=base, checker_source="base")
+    assert not sentinel.exists(), "no base sidecar -> must not execute the PR checker"
+    assert "c" in {cid for _, cid, _ in res.errored}
+    assert res.exit_code == cli.EXIT_ERRORED  # ERRORED, never BROKEN, never green
+
+
+# --- deny-exec composes with base mode ----------------------------------------
+
+
+def test_base_mode_with_deny_shell_errors_base_executable(tmp_path: Path) -> None:
+    """Even a BASE-approved executable checker is blocked under deny-exec/deny-shell:
+    the two controls compose. base mode picks the trusted spec; the policy still
+    refuses to run it -> ERRORED, never executed."""
+    repo = _repo(tmp_path)
+    sentinel = tmp_path / "DENY_PWNED"
+    seal_artifact(
+        repo,
+        "note.md",
+        ReadSet(entries=(), produced_by=PB),
+        [_claim("c", f"shell:touch {sentinel}", "C5", watch=("src/config.py",))],
+    )
+    sentinel.unlink(missing_ok=True)  # seal-time run created it; reset for the assertion
+    base = commit_all(repo, "base: shell checker")
+    write(repo, "src/config.py", "TIMEOUT = 31\n")
+    res = revalidate(
+        repo,
+        since=base,
+        checker_source="base",
+        policy=ExecutionPolicy(allow_exec=True, allow_shell=False),
+    )
+    assert not sentinel.exists()
+    assert "c" in {cid for _, cid, _ in res.errored}
+
+
+# --- head is the default; base requires --since -------------------------------
+
+
+def test_head_is_default_and_unchanged(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    seal_artifact(
+        repo,
+        "note.md",
+        ReadSet(entries=(), produced_by=PB),
+        [_claim("t", r"regex:src/config.py::TIMEOUT\s*=\s*30")],
+    )
+    base = commit_all(repo, "base")
+    write(repo, "src/config.py", "TIMEOUT = 10\n")
+    default = revalidate(repo, since=base)
+    head = revalidate(repo, since=base, checker_source="head")
+    assert {c for _, c, _ in default.broken} == {c for _, c, _ in head.broken} == {"t"}
+    assert default.notes == [] and head.notes == []
+
+
+def test_base_mode_requires_since(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    changed = repo / "changed.txt"
+    changed.write_text("src/config.py\n", encoding="utf-8")
+    with pytest.raises(ValueError, match="base"):
+        revalidate(repo, changed_paths_file=changed, checker_source="base")
+
+
+def test_cli_base_without_since_is_usage_error(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    changed = repo / "changed.txt"
+    changed.write_text("src/config.py\n", encoding="utf-8")
+    rc = cli.main(
+        [
+            "--repo",
+            str(repo),
+            "revalidate",
+            "--changed-paths",
+            str(changed),
+            "--checker-source",
+            "base",
+        ]
+    )
+    assert rc == cli.EXIT_USAGE
+
+
+def test_cli_env_fallback_selects_base(tmp_path: Path, monkeypatch) -> None:
+    repo = _repo(tmp_path)
+    seal_artifact(
+        repo,
+        "note.md",
+        ReadSet(entries=(), produced_by=PB),
+        [_claim("t", r"regex:src/config.py::TIMEOUT\s*=\s*30", watch=("src/config.py",))],
+    )
+    base = commit_all(repo, "base")
+    write(repo, "src/config.py", "TIMEOUT = 10\n")
+    monkeypatch.setenv("DORIAN_CHECKER_SOURCE", "base")
+    rc = cli.main(["--repo", str(repo), "revalidate", "--since", base])
+    assert rc == cli.EXIT_REVOKED  # base spec ran and caught the drift

From 04ab60bb4bde4947cffc2a77f46ce421aeb889e3 Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 17:51:29 +0530
Subject: [PATCH 03/13] =?UTF-8?q?feat(v1):=20multi-index=20binding=20?=
 =?UTF-8?q?=E2=80=94=20config-key=20index=20for=20TOML/JSON=20(WP5)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Extends binding beyond Python definers/console-scripts to config keys in tracked
.toml/.json files: a claim mentioning a config key is re-checked when the defining
config file changes. Conservative and trigger-only (never proves truth):

- symbol_index.config_key_index: key -> tracked .toml/.json files + unparseable list.
  YAML deliberately excluded (parsing needs a third-party dep; core stays zero-dep).
- claim_config_watch_paths + claim_watch_paths (unified symbol+script+config union);
  verify/rebind now widen with the merged watch set.
- ambiguous_config_mentions: a key in >1 file is skipped (a wrong watch is a false
  alarm) and surfaced via verify warnings + bind-suggest, never guessed.
- unparseable supported config files are surfaced as a diagnostic, never silent.
- bind-suggest gains provenance (bind (symbol) vs bind (config)) + ambiguous-config
  + unparseable-config lines; JSON adds bind_config/ambiguous_config/unparseable_config.
- config_key_index degrades to empty on a non-git repo (never blocks).

Updated test_symbol_index pyproject-script expectation (a script-name claim now also
watches pyproject.toml where the script is declared) and the trusted-base design
doc-guard (now IMPLEMENTED). 639 non-slow tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 V1_IMPLEMENTATION_TRACKER.md   |   2 +-
 src/dorian/commands.py         |  60 ++++++++++--
 src/dorian/symbol_index.py     | 106 +++++++++++++++++++++
 tests/test_claude_code_docs.py |  10 +-
 tests/test_config_binding.py   | 168 +++++++++++++++++++++++++++++++++
 tests/test_symbol_index.py     |   5 +-
 6 files changed, 336 insertions(+), 15 deletions(-)
 create mode 100644 tests/test_config_binding.py

diff --git a/V1_IMPLEMENTATION_TRACKER.md b/V1_IMPLEMENTATION_TRACKER.md
index 44db552..db61176 100644
--- a/V1_IMPLEMENTATION_TRACKER.md
+++ b/V1_IMPLEMENTATION_TRACKER.md
@@ -103,7 +103,7 @@ Categories: IMPL=must-implement · TEST=must-test regression · DOC=must-documen
 | WP2 | checker-strength / claim-risk linter | DONE (strength.py; surfaced in `bindings` + binding-gate warn; 19 tests) |
 | WP3 | Python structural checkers (py-signature, py-const) | DONE (pyast.py + C3 subgrammars; 27 tests incl. e2e) |
 | WP4 | semantic-context source search (`code:`) | DONE (pyast.code_only_python + C3 `code:`; 12 tests) |
-| WP5 | multi-index binding (config-key) | TODO |
+| WP5 | multi-index binding (config-key) | DONE (symbol_index.config_key_index + claim_watch_paths; TOML/JSON only, YAML excluded = zero-dep; provenance in bind-suggest; ambiguity + unparseable surfaced; 9 tests) |
 | WP6 | C4 test-adequacy lint | DONE (strength.c4_adequacy; folded into WP2 tests) |
 | WP7 | trusted-base checker-source mode | DONE (revalidate --checker-source base + Action checker_trust; 10-case exploit matrix) |
 | WP8 | warrant-quality mutation harness | TODO |
diff --git a/src/dorian/commands.py b/src/dorian/commands.py
index 0e23f72..d6feb9b 100644
--- a/src/dorian/commands.py
+++ b/src/dorian/commands.py
@@ -247,14 +247,17 @@ def cmd_verify(args: argparse.Namespace) -> int:
         # claims mention (even when no checker named them): the symbol-definer watch
         # the seal adds is then also captured + hashed + scope-linted honestly
         paths = referenced_paths(claims)
-        symbol_watch = symbol_index.claim_symbol_watch_paths(repo, claims)
+        # multi-index binding: Python symbol-definers + pyproject scripts + config keys
+        symbol_watch = symbol_index.claim_watch_paths(repo, claims)
         for path in sorted({p for ps in symbol_watch.values() for p in ps}):
             if path not in paths:
                 paths.append(path)
         readset = parse_manual(paths, repo)
-        # a load-bearing claim naming an AMBIGUOUS symbol (>1 definer) is left unbound; do
-        # not let that skip be silent — warn so the author binds it explicitly (see A3)
+        # a load-bearing claim naming an AMBIGUOUS symbol/config key (>1 definer) is left
+        # unbound; do not let that skip be silent — warn so the author binds it explicitly
         ambiguous = symbol_index.ambiguous_symbol_mentions(repo, claims)
+        ambiguous_config = symbol_index.ambiguous_config_mentions(repo, claims)
+        _, unparseable_config = symbol_index.config_key_index(repo)
     except (ValueError, OSError, gitio.GitError) as exc:
         print(f"dorian verify: {exc}", file=sys.stderr)
         return EXIT_USAGE
@@ -293,6 +296,20 @@ def cmd_verify(args: argparse.Namespace) -> int:
                 "checker or qualify the reference",
                 file=sys.stderr,
             )
+    for cid, cfg in ambiguous_config.items():
+        for key, files in cfg.items():
+            print(
+                f"dorian verify: warning: load-bearing claim {cid!r} mentions config key "
+                f"{key!r} (defined in {len(files)} config files); left unbound — name the file "
+                "in a checker",
+                file=sys.stderr,
+            )
+    for cfg_path in unparseable_config:
+        print(
+            f"dorian verify: warning: config file {cfg_path!r} could not be parsed; its keys "
+            "are not indexed for binding (a claim mentioning them may be silently unbound)",
+            file=sys.stderr,
+        )
     backed = sum(1 for c in claims if c.backed)
     print(warrant.id)
     print(
@@ -482,8 +499,12 @@ def cmd_bind_suggest(args: argparse.Namespace) -> int:
     except (ValueError, OSError) as exc:
         print(f"dorian bind-suggest: {exc}", file=sys.stderr)
         return EXIT_USAGE
+    # multi-index binding with provenance: symbol-definer/script vs config-key
     watch = symbol_index.claim_symbol_watch_paths(repo, claims)
+    config_watch = symbol_index.claim_config_watch_paths(repo, claims)
     ambiguous = symbol_index.ambiguous_symbol_mentions(repo, claims)
+    ambiguous_config = symbol_index.ambiguous_config_mentions(repo, claims)
+    _, unparseable_config = symbol_index.config_key_index(repo)
     suggestions: list[dict] = []
     for c in claims:
         try:
@@ -491,17 +512,38 @@ def cmd_bind_suggest(args: argparse.Namespace) -> int:
         except ValueError:
             covered = set()  # C1 span / C5 shell: no auto-derivable read-set to compare
         bind = [f for f in watch.get(c.id, ()) if f not in covered]
+        bind_config = [f for f in config_watch.get(c.id, ()) if f not in covered]
         amb = {s: list(files) for s, files in ambiguous.get(c.id, {}).items()}
-        if bind or amb:
-            suggestions.append({"claim_id": c.id, "bind": bind, "ambiguous": amb})
+        amb_cfg = {k: list(files) for k, files in ambiguous_config.get(c.id, {}).items()}
+        if bind or bind_config or amb or amb_cfg:
+            suggestions.append(
+                {
+                    "claim_id": c.id,
+                    "bind": bind,  # symbol-definer / console-script provenance
+                    "bind_config": bind_config,  # config-key provenance
+                    "ambiguous": amb,
+                    "ambiguous_config": amb_cfg,
+                }
+            )
     if args.json:
-        print(json.dumps({"suggestions": suggestions}, sort_keys=True))
+        print(
+            json.dumps(
+                {"suggestions": suggestions, "unparseable_config": list(unparseable_config)},
+                sort_keys=True,
+            )
+        )
         return EXIT_OK
     for s in suggestions:
         if s["bind"]:
-            print(f"{s['claim_id']}  bind: {', '.join(s['bind'])}")
+            print(f"{s['claim_id']}  bind (symbol): {', '.join(s['bind'])}")
+        if s["bind_config"]:
+            print(f"{s['claim_id']}  bind (config): {', '.join(s['bind_config'])}")
         for sym, files in sorted(s["ambiguous"].items()):
-            print(f"{s['claim_id']}  ambiguous: {sym} ({len(files)} definers, unbound)")
+            print(f"{s['claim_id']}  ambiguous symbol: {sym} ({len(files)} definers, unbound)")
+        for key, files in sorted(s["ambiguous_config"].items()):
+            print(f"{s['claim_id']}  ambiguous config: {key} ({len(files)} files, unbound)")
+    for cfg in unparseable_config:
+        print(f"unparseable config (keys not indexed for binding): {cfg}")
     print(f"{len(suggestions)} claim(s) with binding suggestions")
     return EXIT_OK
 
@@ -543,7 +585,7 @@ def cmd_rebind(args: argparse.Namespace) -> int:
             file=sys.stderr,
         )
         return EXIT_USAGE
-    symbol_watch = symbol_index.claim_symbol_watch_paths(repo, claims)
+    symbol_watch = symbol_index.claim_watch_paths(repo, claims)  # symbol-definer + config-key
     new_paths = {p for ps in symbol_watch.values() for p in ps}
     already_watched = {w for c in claims for spec in c.checkers for w in spec.watch}
     if new_paths <= already_watched:
diff --git a/src/dorian/symbol_index.py b/src/dorian/symbol_index.py
index 8ad57c9..f35aff4 100644
--- a/src/dorian/symbol_index.py
+++ b/src/dorian/symbol_index.py
@@ -26,6 +26,7 @@ class from docs/NEXT_ALGORITHMIC_BETS.md #1 — where a claim about a symbol
 from __future__ import annotations
 
 import ast
+import json
 import tomllib
 from pathlib import Path
 
@@ -36,6 +37,11 @@ class from docs/NEXT_ALGORITHMIC_BETS.md #1 — where a claim about a symbol
 _MAX_FILE_BYTES = 1 << 20  # skip files > 1 MiB (mirrors bindings); parsing them is wasteful
 _DEF_NODES = (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)
 
+# config-key index: stdlib-parseable formats only. YAML is deliberately excluded —
+# parsing it needs a third-party dep and dorian's core has zero runtime deps.
+_CONFIG_SUFFIXES = (".toml", ".json")
+_MIN_KEY_LEN = 4  # mirror bindings._MIN_IDENT: shorter keys are noise
+
 
 def python_symbol_definers(repo: Path) -> dict[str, tuple[str, ...]]:
     """Symbol name -> the sorted, unique git-tracked `.py` files that define it
@@ -152,6 +158,106 @@ def claim_symbol_watch_paths(repo: Path, claims: list[Claim]) -> dict[str, tuple
     return out
 
 
+def _walk_keys(obj: object):
+    """Yield every string dict-key in a nested TOML/JSON structure (recursively)."""
+    if isinstance(obj, dict):
+        for k, v in obj.items():
+            if isinstance(k, str):
+                yield k
+            yield from _walk_keys(v)
+    elif isinstance(obj, list):
+        for item in obj:
+            yield from _walk_keys(item)
+
+
+def config_key_index(repo: Path) -> tuple[dict[str, tuple[str, ...]], tuple[str, ...]]:
+    """(key -> sorted tracked .toml/.json files defining it, sorted unparseable files).
+
+    Keys shorter than _MIN_KEY_LEN are dropped as noise. A supported config file that
+    fails to parse is returned in the second element — a LOUD diagnostic, never a
+    silent skip that would hide a missed binding. YAML is not indexed (no runtime dep).
+    """
+    repo = repo.resolve()
+    keys: dict[str, set[str]] = {}
+    unparseable: list[str] = []
+    try:
+        tracked = gitio.ls_files(repo)
+    except gitio.GitError:
+        return ({}, ())  # not a git checkout: degrade to no index (never blocks)
+    for rel in tracked:
+        if not rel.endswith(_CONFIG_SUFFIXES):
+            continue
+        path = repo / rel
+        try:
+            if not path.is_file() or path.stat().st_size > _MAX_FILE_BYTES:
+                continue
+            raw = path.read_text(encoding="utf-8")
+            data = tomllib.loads(raw) if rel.endswith(".toml") else json.loads(raw)
+        except (
+            OSError,
+            UnicodeDecodeError,
+            tomllib.TOMLDecodeError,
+            json.JSONDecodeError,
+            RecursionError,
+        ):
+            unparseable.append(rel)
+            continue
+        for key in _walk_keys(data):
+            if len(key) >= _MIN_KEY_LEN:
+                keys.setdefault(key, set()).add(rel)
+    return ({k: tuple(sorted(v)) for k, v in sorted(keys.items())}, tuple(sorted(unparseable)))
+
+
+def claim_config_watch_paths(repo: Path, claims: list[Claim]) -> dict[str, tuple[str, ...]]:
+    """claim id -> the config file(s) to add to its watch set: for every identifier-shaped
+    token in the claim text that is a config key defined in EXACTLY ONE tracked .toml/.json.
+    Ambiguous keys (>1 file) are skipped (see ambiguous_config_mentions). Additive and
+    trigger-only — a config change re-checks the claim; the checker still decides truth."""
+    claim_tokens = {c.id: _tokens(c.text) for c in claims if isinstance(c.text, str)}
+    if not any(claim_tokens.values()):
+        return {}
+    index, _ = config_key_index(repo)
+    out: dict[str, tuple[str, ...]] = {}
+    for claim in claims:
+        paths: set[str] = set()
+        for token in claim_tokens.get(claim.id, ()):
+            files = index.get(token)
+            if files is not None and len(files) == 1:
+                paths.add(files[0])
+        if paths:
+            out[claim.id] = tuple(sorted(paths))
+    return out
+
+
+def claim_watch_paths(repo: Path, claims: list[Claim]) -> dict[str, tuple[str, ...]]:
+    """All deterministic re-check watches dorian binds per claim: Python symbol-definer
+    files + pyproject console scripts (claim_symbol_watch_paths) UNION config-key files
+    (claim_config_watch_paths). Union, sorted, deduped. Conservative and additive — it
+    only ever widens the re-check trigger set; it never proves a claim true."""
+    merged: dict[str, set[str]] = {}
+    for source in (claim_symbol_watch_paths(repo, claims), claim_config_watch_paths(repo, claims)):
+        for cid, paths in source.items():
+            merged.setdefault(cid, set()).update(paths)
+    return {cid: tuple(sorted(paths)) for cid, paths in merged.items()}
+
+
+def ambiguous_config_mentions(
+    repo: Path, claims: list[Claim]
+) -> dict[str, dict[str, tuple[str, ...]]]:
+    """claim id -> {config key: defining files} for keys a LOAD-BEARING claim mentions
+    that are defined in MORE THAN ONE tracked config file — the ambiguous case binding
+    skips. Lets verify/bind-suggest surface the skip rather than guess. {} if none."""
+    index, _ = config_key_index(repo)
+    out: dict[str, dict[str, tuple[str, ...]]] = {}
+    for claim in claims:
+        if not claim.load_bearing or not isinstance(claim.text, str):
+            continue
+        ambiguous = {tok: index[tok] for tok in _tokens(claim.text) if len(index.get(tok, ())) > 1}
+        if ambiguous:
+            out[claim.id] = ambiguous
+    return out
+
+
 def ambiguous_symbol_mentions(
     repo: Path, claims: list[Claim]
 ) -> dict[str, dict[str, tuple[str, ...]]]:
diff --git a/tests/test_claude_code_docs.py b/tests/test_claude_code_docs.py
index d75ab02..26c428b 100644
--- a/tests/test_claude_code_docs.py
+++ b/tests/test_claude_code_docs.py
@@ -141,13 +141,15 @@ def test_public_benchmark_manifest_contains_only_public_repos() -> None:
             assert "genai-core" not in path.read_text(encoding="utf-8").lower()
 
 
-# --- Slice E: trusted-base Action design (HUMAN REVIEW REQUIRED, no code) -----------
+# --- Slice E: trusted-base Action design (IMPLEMENTED in V1, with caveats) ----------
 
 
-def test_trusted_base_action_design_is_flagged_and_safe_by_default() -> None:
+def test_trusted_base_action_design_states_implemented_and_keeps_caveats() -> None:
     doc = _read("docs/TRUSTED_BASE_ACTION_DESIGN.md")
-    assert "HUMAN REVIEW REQUIRED" in doc
-    assert "not implemented" in doc.lower()
+    # the design is now implemented (revalidate --checker-source base + Action input)
+    assert "IMPLEMENTED" in doc
+    assert "tests/test_trusted_base.py" in doc  # the §6 matrix is realized in tests
+    # the hard safety constraints must remain stated even after implementation
     assert "pull_request_target" in doc  # the never-use constraint is stated
     assert "does not sandbox PR-head code" in doc
 
diff --git a/tests/test_config_binding.py b/tests/test_config_binding.py
new file mode 100644
index 0000000..b97e010
--- /dev/null
+++ b/tests/test_config_binding.py
@@ -0,0 +1,168 @@
+"""Multi-index binding (WP5): config-key index for TOML/JSON.
+
+Binding widens the set of source changes that RE-CHECK a claim. v0.11 indexed only
+Python definers and pyproject console scripts; this adds config keys in tracked
+`.toml`/`.json` files, so a claim mentioning a config key is re-checked when the
+defining config file changes. It stays conservative and trigger-only:
+
+- only UNAMBIGUOUS keys (defined in exactly one tracked config file) are bound; a key
+  in more than one file is surfaced and left unwatched (a wrong watch is a false alarm);
+- an unparseable supported config file is surfaced as a diagnostic, never a silent skip;
+- YAML is intentionally NOT indexed — parsing it needs a third-party dependency and
+  dorian's core has zero runtime deps;
+- binding never proves truth — it only decides WHEN the claim is re-checked.
+"""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+from conftest import commit_all, git, write
+from dorian import cli, symbol_index
+from dorian.model import Claim
+
+
+def _repo(tmp_path: Path) -> Path:
+    repo = tmp_path / "repo"
+    repo.mkdir()
+    git(repo, "init", "-q", "-b", "main")
+    return repo
+
+
+def _claim(cid: str, text: str, program: str) -> Claim:
+    from dorian.model import CheckerSpec
+
+    return Claim(
+        id=cid,
+        text=text,
+        kind="quantity",
+        load_bearing=True,
+        checkers=(CheckerSpec(type="C3", program=program),),
+    )
+
+
+def test_config_key_index_toml_and_json(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    write(repo, "settings.toml", "[database]\nmax_connections = 5\n")
+    write(repo, "feature.json", '{"flags": {"new_login": true}}\n')
+    commit_all(repo, "config")
+    index, unparseable = symbol_index.config_key_index(repo)
+    assert index["max_connections"] == ("settings.toml",)
+    assert index["database"] == ("settings.toml",)
+    assert index["new_login"] == ("feature.json",)
+    assert unparseable == ()
+
+
+def test_claim_mentioning_config_key_binds_the_file(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    write(repo, "settings.toml", "[database]\nmax_connections = 5\n")
+    commit_all(repo, "config")
+    claims = [_claim("c", "The `max_connections` pool size is 5.", "path:settings.toml")]
+    watch = symbol_index.claim_config_watch_paths(repo, claims)
+    assert watch == {"c": ("settings.toml",)}
+
+
+def test_ambiguous_config_key_is_skipped_and_surfaced(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    write(repo, "a.toml", "max_connections = 5\n")
+    write(repo, "b.toml", "max_connections = 9\n")
+    commit_all(repo, "two configs")
+    claims = [_claim("c", "The `max_connections` value matters.", "path:a.toml")]
+    # ambiguous (2 files) -> no guessed watch
+    assert symbol_index.claim_config_watch_paths(repo, claims) == {}
+    # ...but surfaced for the author to disambiguate
+    amb = symbol_index.ambiguous_config_mentions(repo, claims)
+    assert set(amb["c"]["max_connections"]) == {"a.toml", "b.toml"}
+
+
+def test_unparseable_config_is_a_diagnostic_not_silent(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    write(repo, "broken.json", "{not valid json\n")
+    write(repo, "ok.toml", "key_name = 1\n")
+    commit_all(repo, "configs")
+    index, unparseable = symbol_index.config_key_index(repo)
+    assert "broken.json" in unparseable  # surfaced, not silently dropped
+    assert index["key_name"] == ("ok.toml",)  # the parseable one still indexes
+
+
+def test_yaml_is_not_indexed_zero_runtime_dep(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    write(repo, "conf.yaml", "max_connections: 5\n")
+    commit_all(repo, "yaml")
+    index, unparseable = symbol_index.config_key_index(repo)
+    assert "max_connections" not in index  # YAML excluded by design
+    assert "conf.yaml" not in unparseable  # not even attempted
+
+
+def test_claim_watch_paths_merges_symbol_and_config(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    write(repo, "src/app.py", "def make_pool():\n    return 1\n")
+    write(repo, "settings.toml", "[server]\nmax_workers = 4\n")
+    commit_all(repo, "code + config")
+    claims = [_claim("c", "`make_pool` reads `max_workers` from settings.", "path:settings.toml")]
+    watch = symbol_index.claim_watch_paths(repo, claims)
+    assert set(watch["c"]) == {"src/app.py", "settings.toml"}  # symbol + config union
+
+
+# --- end to end: a config-key claim re-checks when the config file changes -----
+
+
+def test_verify_binds_config_and_revalidate_rechecks(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    write(repo, "settings.toml", "[server]\nmax_workers = 4\n")
+    write(repo, "note.md", "# note\n\nThe server uses 4 max_workers.\n")
+    commit_all(repo, "init")
+    base = git(repo, "rev-parse", "HEAD")
+    # the claim's CHECKER names note-adjacent text but the binding watches settings.toml
+    claims = {
+        "claims": [
+            {
+                "id": "workers",
+                "text": "The server `max_workers` pool is 4.",
+                "kind": "quantity",
+                "load_bearing": True,
+                "checkers": [{"type": "C3", "program": "py-const:settings.toml::placeholder::0"}],
+            }
+        ]
+    }
+    # use a config-only checker that can't parse TOML as python -> would ERROR; instead
+    # bind via a real checker on the toml. Simpler: a regex checker on the toml value.
+    claims["claims"][0]["checkers"] = [
+        {"type": "C3", "program": r"regex:settings.toml::max_workers\s*=\s*4"}
+    ]
+    cp = repo / "claims.json"
+    cp.write_text(json.dumps(claims), encoding="utf-8")
+    assert cli.main(["--repo", str(repo), "verify", "note.md", "--claims", str(cp)]) == 0
+
+    # the symbol/config binding is moot here (checker already names settings.toml), so
+    # assert the config index would bind it independently of the checker:
+    parsed = [Claim.from_dict(c) for c in claims["claims"]]
+    assert "settings.toml" in symbol_index.claim_watch_paths(repo, parsed).get("workers", ())
+
+    write(repo, "settings.toml", "[server]\nmax_workers = 8\n")  # drift
+    assert cli.main(["--repo", str(repo), "revalidate", "--since", base]) == cli.EXIT_REVOKED
+
+
+def test_bind_suggest_shows_config_provenance(tmp_path: Path, capsys) -> None:
+    repo = _repo(tmp_path)
+    write(repo, "settings.toml", "[server]\nmax_workers = 4\n")
+    write(repo, "broken.json", "{bad\n")
+    commit_all(repo, "configs")
+    claims = {
+        "claims": [
+            {
+                "id": "w",
+                "text": "The `max_workers` pool is 4.",
+                "kind": "quantity",
+                "load_bearing": True,
+                "checkers": [{"type": "C3", "program": "path:settings.toml"}],
+            }
+        ]
+    }
+    cp = repo / "claims.json"
+    cp.write_text(json.dumps(claims), encoding="utf-8")
+    assert cli.main(["--repo", str(repo), "bind-suggest", "--claims", str(cp)]) == 0
+    out = capsys.readouterr().out
+    assert "config" in out  # provenance label for the config-key binding
+    assert "broken.json" in out  # unparseable config surfaced as a diagnostic
diff --git a/tests/test_symbol_index.py b/tests/test_symbol_index.py
index 4b28eb3..dc5af15 100644
--- a/tests/test_symbol_index.py
+++ b/tests/test_symbol_index.py
@@ -459,7 +459,10 @@ def test_pyproject_script_binds_target_file(fixture_repo: Path) -> None:
     commit_all(fixture_repo, "add a console-script entry point")
     assert _verify(fixture_repo, [_SCRIPT_CLAIM]) == 0
     w = _warrant(fixture_repo)
-    assert w.claims[0].checkers[0].watch == ("src/routes.py", "pkg/cli.py")
+    # symbol-definer binding adds the script target (pkg/cli.py); the multi-index
+    # config-key binding also watches pyproject.toml, where `mytool` is declared —
+    # so a change to the script entry itself re-checks the claim too.
+    assert w.claims[0].checkers[0].watch == ("src/routes.py", "pkg/cli.py", "pyproject.toml")
     assert "pkg/cli.py" in {e.uri for e in w.read_set}
 
 

From 2a66a49eee7b8aa069d7fb9222572b272493856d Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 17:58:45 +0530
Subject: [PATCH 04/13] =?UTF-8?q?feat(v1):=20warrant-quality=20mutation=20?=
 =?UTF-8?q?harness=20=E2=80=94=20dorian=20bench=20warrant-quality=20(WP8)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Offline, per-claim evidence: for each claim it derives deterministic mutations from the
checker grammar and records whether the verdict matched expectation —
- falsify (rename symbol / reassign const / change param): expect FAIL; a PASS is a MISS;
- benign (trailing comment): expect PASS; a FAIL is BRITTLE (false alarm);
- ceiling (content drift keeping an existence symbol): expect PASS, recorded as the
  documented trigger-vs-truth ceiling, never a penalty.
ERROR (e.g. an executable checker under --deny-exec) is its own bucket, never a miss.
Output is deterministic (no timestamps/randomness) and never mutates the real repo —
each mutation runs against a throwaway copy of only the file the checker reads.

Honest scope: structural/existence C3 forms are mutation-scored; string/regex/code, typed
C5, C1, C4 are reported with strength and `mutation: unsupported` (no fabricated mutation).
Registered as the `warrant-quality` bench subcommand. 7 tests; 645 non-slow pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 V1_IMPLEMENTATION_TRACKER.md  |   2 +-
 bench/warrant_quality.py      | 233 ++++++++++++++++++++++++++++++++++
 src/dorian/commands.py        |   1 +
 tests/test_warrant_quality.py | 142 +++++++++++++++++++++
 4 files changed, 377 insertions(+), 1 deletion(-)
 create mode 100644 bench/warrant_quality.py
 create mode 100644 tests/test_warrant_quality.py

diff --git a/V1_IMPLEMENTATION_TRACKER.md b/V1_IMPLEMENTATION_TRACKER.md
index db61176..d9bf12c 100644
--- a/V1_IMPLEMENTATION_TRACKER.md
+++ b/V1_IMPLEMENTATION_TRACKER.md
@@ -106,7 +106,7 @@ Categories: IMPL=must-implement · TEST=must-test regression · DOC=must-documen
 | WP5 | multi-index binding (config-key) | DONE (symbol_index.config_key_index + claim_watch_paths; TOML/JSON only, YAML excluded = zero-dep; provenance in bind-suggest; ambiguity + unparseable surfaced; 9 tests) |
 | WP6 | C4 test-adequacy lint | DONE (strength.c4_adequacy; folded into WP2 tests) |
 | WP7 | trusted-base checker-source mode | DONE (revalidate --checker-source base + Action checker_trust; 10-case exploit matrix) |
-| WP8 | warrant-quality mutation harness | TODO |
+| WP8 | warrant-quality mutation harness | DONE (bench/warrant_quality.py; `dorian bench warrant-quality`; deterministic, offline, never mutates real repo; trigger vs verdict; ERROR bucket distinct; honest scope = structural/existence forms scored, others reported strength-only; 7 tests) |
 | WP9 | current-version benchmark results | TODO |
 | WP10 | V1 release prep / decision | TODO |
 
diff --git a/bench/warrant_quality.py b/bench/warrant_quality.py
new file mode 100644
index 0000000..3b0862d
--- /dev/null
+++ b/bench/warrant_quality.py
@@ -0,0 +1,233 @@
+"""`dorian bench warrant-quality` — per-claim warrant quality by mutation testing.
+
+Repo-level benchmarks answer "does the mechanism work on this suite?". This answers
+the question a USER actually has about THEIR warrant: *for this claim, would the
+checker catch the drift it is supposed to?* It is an OFFLINE evidence generator, not
+runtime revalidation, and it never mutates the real repo (each mutation runs against a
+throwaway copy of only the file the checker reads).
+
+For each claim it derives deterministic mutations from the checker grammar and records,
+per mutation, whether the checker's verdict matched the expectation:
+
+- **falsify** — a change that SHOULD make the claim false (rename the symbol, change the
+  constant's value, change a parameter). Expect FAIL. A PASS here is a **miss** — the
+  checker cannot see the very drift it implies it guards.
+- **benign** — a formatting-only change the checker promises to tolerate. Expect PASS. A
+  FAIL here is **brittle** — a false alarm.
+- **ceiling** — a content change that the checker is KNOWN to be blind to (an existence
+  checker cannot see a body/content change). Expect PASS, recorded as the documented
+  trigger-vs-truth ceiling, never counted against the claim.
+
+Scope (honest): mutations are generated for the C3 structural/existence forms
+(`symbol:`, `py-signature:`, `py-const:`) where a falsifying edit is mechanically
+unambiguous. Other forms (`string:`/`regex:`/`code:`, typed C5, C1, C4) are reported
+with their checker strength and `mutation: unsupported` — the harness does not fabricate
+a mutation it cannot make deterministically. ERROR (e.g. an executable checker blocked by
+`--deny-exec`) is its own bucket, never conflated with a miss or a FAIL. Output is
+deterministic (no timestamps, no randomness) and stable for golden tests.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import re
+import sys
+import tempfile
+from collections.abc import Iterator
+from pathlib import Path
+
+from dorian.checkers.base import CheckContext, Verdict, run_checker
+from dorian.model import CheckerSpec, Claim, Warrant
+from dorian.policy import ExecutionPolicy, executable_kind
+from dorian.strength import claim_strength
+
+SCHEMA = "dorian-warrant-quality-v1"
+
+# (mutation name, expectation) — expectation is the verdict that means "as designed".
+_FALSIFY = "falsify"  # expect FAIL (catch the drift)
+_BENIGN = "benign"  # expect PASS (tolerate formatting)
+_CEILING = "ceiling"  # expect PASS (known-blind; never a penalty)
+
+
+def _c3_parts(spec: CheckerSpec) -> tuple[str, str, str]:
+    """(prefix, file, operand) for a `<prefix>:<file>::<operand>` C3 program."""
+    prefix, _, rest = spec.program.partition(":")
+    file, _, operand = rest.partition("::")
+    return prefix, file, operand
+
+
+def _mutations(spec: CheckerSpec) -> Iterator[tuple[str, str, str, object]]:
+    """Yield (name, file, expectation, mutate) where mutate(text)->text. Empty for
+    forms the harness does not deterministically mutate."""
+    if spec.type != "C3":
+        return
+    prefix, file, operand = _c3_parts(spec)
+    if not file:
+        return
+    if prefix == "symbol":
+        name = operand
+        yield (
+            "rename_definition",
+            file,
+            _FALSIFY,
+            lambda t: re.sub(rf"\b(def|class)\s+{re.escape(name)}\b", r"\1 dorian_mut", t),
+        )
+        # an existence checker is blind to any content change that keeps the name:
+        yield ("append_content", file, _CEILING, lambda t: t + "\n# dorian: content drift\n")
+    elif prefix == "py-const":
+        qual = operand.partition("::")[0]
+        if "." in qual:
+            return  # dotted (class attr): a top-level reassignment would not shadow it
+        yield ("reassign_value", file, _FALSIFY, lambda t: t + f'\n{qual} = "__dorian_mut__"\n')
+        yield ("append_comment", file, _BENIGN, lambda t: t + "\n# dorian: benign comment\n")
+    elif prefix == "py-signature":
+        qual = operand.partition("::")[0]
+        if "." in qual:
+            return  # dotted (method): a top-level def would not shadow it
+        yield (
+            "change_param",
+            file,
+            _FALSIFY,
+            lambda t: t + f"\n\ndef {qual}(dorian_mut_param):\n    return None\n",
+        )
+        yield ("append_comment", file, _BENIGN, lambda t: t + "\n# dorian: benign comment\n")
+
+
+def _run_mutated(repo: Path, claim: Claim, spec_index: int, file: str, mutate, policy) -> Verdict:
+    """Run one checker against a throwaway copy of `file` with `mutate` applied. Only the
+    one file the checker reads is materialized — the real repo is never touched."""
+    original = (repo / file).read_text(encoding="utf-8", errors="replace")
+    with tempfile.TemporaryDirectory() as td:
+        work = Path(td)
+        target = work / file
+        target.parent.mkdir(parents=True, exist_ok=True)
+        target.write_text(mutate(original), encoding="utf-8")
+        ctx = CheckContext(repo=work, claim=claim, policy=policy)
+        return run_checker(ctx, spec_index).verdict
+
+
+def _score_mutation(verdict: Verdict, expectation: str) -> str:
+    """Classify a mutation outcome. ERROR is always its own bucket."""
+    if verdict is Verdict.ERROR:
+        return "errored"
+    if expectation == _FALSIFY:
+        return "caught" if verdict is Verdict.FAIL else "missed"
+    if expectation == _BENIGN:
+        return "brittle" if verdict is Verdict.FAIL else "ok"
+    return "ceiling" if verdict is Verdict.PASS else "ceiling_caught"  # _CEILING
+
+
+def score_claim(repo: Path, claim: Claim, policy: ExecutionPolicy) -> dict:
+    """Per-claim quality record: strength, trigger watch, and mutation outcomes."""
+    strongest = claim_strength(claim)  # rank-aware (existence < ... < behavioral)
+    mutations: list[dict] = []
+    for i, spec in enumerate(claim.checkers):
+        any_mutation = False
+        for name, file, expectation, mutate in _mutations(spec):
+            any_mutation = True
+            verdict = _run_mutated(repo, claim, i, file, mutate, policy)
+            mutations.append(
+                {
+                    "checker": spec.type,
+                    "program": spec.program,
+                    "mutation": name,
+                    "expectation": expectation,
+                    "verdict": verdict.value,
+                    "outcome": _score_mutation(verdict, expectation),
+                }
+            )
+        if not any_mutation:
+            mutations.append(
+                {
+                    "checker": spec.type,
+                    "program": spec.program,
+                    "mutation": "unsupported",
+                    "expectation": "n/a",
+                    "verdict": "n/a",
+                    "outcome": "unsupported",
+                    "executes": executable_kind(spec),
+                }
+            )
+    outcomes = [m["outcome"] for m in mutations]
+    quality = (
+        "weak"
+        if "missed" in outcomes
+        else "brittle"
+        if "brittle" in outcomes
+        else "strong"
+        if "caught" in outcomes
+        else "unscored"
+    )
+    return {
+        "claim_id": claim.id,
+        "kind": claim.kind,
+        "load_bearing": claim.load_bearing,
+        "strongest_strength": strongest,
+        "watch": sorted({w for s in claim.checkers for w in s.watch}),
+        "mutations": mutations,
+        "quality": quality,
+    }
+
+
+def summarize(records: list[dict]) -> dict:
+    counts: dict[str, int] = {}
+    for r in records:
+        counts[r["quality"]] = counts.get(r["quality"], 0) + 1
+    mut: dict[str, int] = {}
+    for r in records:
+        for m in r["mutations"]:
+            mut[m["outcome"]] = mut.get(m["outcome"], 0) + 1
+    return {
+        "claims": len(records),
+        "by_quality": dict(sorted(counts.items())),
+        "by_outcome": dict(sorted(mut.items())),
+    }
+
+
+def render(report: dict) -> str:
+    lines = [f"# warrant quality: {report['artifact_uri']}", ""]
+    for r in report["claims"]:
+        lines.append(f"- `{r['claim_id']}` [{r['quality']}] strongest={r['strongest_strength']}")
+        for m in r["mutations"]:
+            lines.append(f"    {m['mutation']}: {m['outcome']} ({m['verdict']})")
+    s = report["summary"]
+    lines += ["", f"{s['claims']} claim(s): {s['by_quality']}", f"mutations: {s['by_outcome']}"]
+    return "\n".join(lines) + "\n"
+
+
+def build(repo: Path, artifact_uri: str, policy: ExecutionPolicy) -> dict:
+    warrant = Warrant.load(repo / (artifact_uri + ".warrant"))
+    records = [score_claim(repo, c, policy) for c in warrant.claims]
+    return {
+        "schema": SCHEMA,
+        "artifact_uri": artifact_uri,
+        "claims": records,
+        "summary": summarize(records),
+    }
+
+
+def main(argv: list[str]) -> int:
+    ap = argparse.ArgumentParser(prog="dorian bench warrant-quality")
+    ap.add_argument("artifact", help="warranted artifact (its .warrant is scored)")
+    ap.add_argument("--repo", default=".")
+    ap.add_argument("--json", action="store_true")
+    ap.add_argument("--deny-exec", action="store_true")
+    ap.add_argument("--deny-shell", action="store_true")
+    args = ap.parse_args(argv)
+    repo = Path(args.repo).resolve()
+    artifact_uri = Path(args.artifact).as_posix()
+    sidecar = repo / (artifact_uri + ".warrant")
+    if not sidecar.is_file():
+        print(f"dorian bench warrant-quality: no warrant for {artifact_uri}", file=sys.stderr)
+        return 2
+    policy = ExecutionPolicy.from_flags_and_env(
+        deny_exec=args.deny_exec, deny_shell=args.deny_shell
+    )
+    report = build(repo, artifact_uri, policy)
+    print(json.dumps(report, sort_keys=True, indent=2) if args.json else render(report), end="")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main(sys.argv[1:]))
diff --git a/src/dorian/commands.py b/src/dorian/commands.py
index d6feb9b..5e0e80a 100644
--- a/src/dorian/commands.py
+++ b/src/dorian/commands.py
@@ -732,6 +732,7 @@ def cmd_report(args: argparse.Namespace) -> int:
     "large-mutation": ("bench.large_mutation", False),
     "binding-lifecycle": ("bench.binding_lifecycle", False),
     "realworld-usecases": ("bench.realworld_usecases", False),
+    "warrant-quality": ("bench.warrant_quality", False),
     "churn": ("bench.churn", False),
 }
 
diff --git a/tests/test_warrant_quality.py b/tests/test_warrant_quality.py
new file mode 100644
index 0000000..b3bebe7
--- /dev/null
+++ b/tests/test_warrant_quality.py
@@ -0,0 +1,142 @@
+"""`dorian bench warrant-quality` (WP8): per-claim mutation scoring.
+
+Pins the WP8 acceptance matrix: deterministic output; a weak (existence) claim scores
+its ceiling and never falsely "strong"; a strong structural claim catches its falsifying
+mutation; a benign formatting mutation does not lower the score; ERROR (policy-blocked)
+is its own bucket, never a miss. Offline and side-effect-free (the real repo is never
+mutated)."""
+
+from __future__ import annotations
+
+import importlib
+import sys
+from pathlib import Path
+
+import pytest
+
+from conftest import commit_all, git, write
+from dorian.model import ProducedBy, ReadSet
+from dorian.policy import ExecutionPolicy
+from dorian.seal import seal_artifact
+
+PB = ProducedBy(runner="manual", captured_at="2026-01-01T00:00:00Z")
+
+
+@pytest.fixture
+def wq():
+    """The repo-local bench module (loaded the way `dorian bench` loads it)."""
+    sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+    return importlib.import_module("bench.warrant_quality")
+
+
+def _repo(tmp_path: Path) -> Path:
+    repo = tmp_path / "repo"
+    repo.mkdir()
+    git(repo, "init", "-q", "-b", "main")
+    write(repo, "src/auth.py", "def verify_token(token):\n    return bool(token)\n")
+    write(repo, "src/config.py", "TIMEOUT = 30\n")
+    write(repo, "note.md", "# note\n\nstuff\n")
+    commit_all(repo, "init")
+    return repo
+
+
+def _seal(repo: Path, claims: list[dict]) -> None:
+    from dorian.model import CheckerSpec, Claim
+
+    objs = [
+        Claim(
+            id=c["id"],
+            text=c.get("text", c["id"]),
+            kind=c["kind"],
+            load_bearing=c.get("load_bearing", True),
+            checkers=(CheckerSpec(type="C3", program=c["program"]),),
+        )
+        for c in claims
+    ]
+    seal_artifact(repo, "note.md", ReadSet(entries=(), produced_by=PB), objs)
+
+
+def _score(wq, repo: Path) -> dict:
+    return wq.build(repo, "note.md", ExecutionPolicy())
+
+
+def test_structural_claim_catches_falsifying_mutation(wq, tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    _seal(
+        repo,
+        [{"id": "timeout", "kind": "quantity", "program": "py-const:src/config.py::TIMEOUT::30"}],
+    )
+    rec = _score(wq, repo)["claims"][0]
+    assert rec["quality"] == "strong"
+    falsify = next(m for m in rec["mutations"] if m["expectation"] == "falsify")
+    assert falsify["outcome"] == "caught"  # reassigning the value -> FAIL
+
+
+def test_signature_claim_catches_param_change(wq, tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    _seal(
+        repo,
+        [
+            {
+                "id": "sig",
+                "kind": "behavior",
+                "program": "py-signature:src/auth.py::verify_token::token",
+            }
+        ],
+    )
+    rec = _score(wq, repo)["claims"][0]
+    falsify = next(m for m in rec["mutations"] if m["expectation"] == "falsify")
+    assert falsify["outcome"] == "caught"
+    benign = next(m for m in rec["mutations"] if m["expectation"] == "benign")
+    assert benign["outcome"] == "ok"  # a trailing comment must not false-alarm
+
+
+def test_existence_claim_shows_its_ceiling(wq, tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    _seal(
+        repo, [{"id": "exists", "kind": "behavior", "program": "symbol:src/auth.py::verify_token"}]
+    )
+    rec = _score(wq, repo)["claims"][0]
+    outcomes = {m["mutation"]: m["outcome"] for m in rec["mutations"]}
+    assert outcomes["rename_definition"] == "caught"  # renaming the def IS caught
+    assert outcomes["append_content"] == "ceiling"  # content drift keeping the name is NOT
+    assert rec["strongest_strength"] == "existence"
+
+
+def test_unsupported_form_is_reported_not_faked(wq, tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    write(repo, "data.csv", "a,b\n1,2\n")
+    commit_all(repo, "data")
+    _seal(repo, [{"id": "rows", "kind": "quantity", "program": "string:src/config.py::TIMEOUT"}])
+    rec = _score(wq, repo)["claims"][0]
+    # string: is not deterministically mutated by the harness -> reported, never faked
+    assert any(m["outcome"] == "unsupported" for m in rec["mutations"])
+
+
+def test_deterministic_output(wq, tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    _seal(
+        repo,
+        [{"id": "timeout", "kind": "quantity", "program": "py-const:src/config.py::TIMEOUT::30"}],
+    )
+    assert _score(wq, repo) == _score(wq, repo)
+    # and the real repo file was never mutated
+    assert (repo / "src/config.py").read_text() == "TIMEOUT = 30\n"
+
+
+def test_cli_smoke_json(wq, tmp_path: Path, capsys) -> None:
+    from dorian import cli
+
+    repo = _repo(tmp_path)
+    _seal(
+        repo,
+        [{"id": "timeout", "kind": "quantity", "program": "py-const:src/config.py::TIMEOUT::30"}],
+    )
+    # --repo after the bench subcommand: it flows to the bench module's own parser
+    rc = cli.main(["bench", "warrant-quality", "note.md", "--repo", str(repo), "--json"])
+    assert rc == 0
+    import json
+
+    out = json.loads(capsys.readouterr().out)
+    assert out["schema"] == "dorian-warrant-quality-v1"
+    assert out["summary"]["by_quality"].get("strong") == 1

From 4e586a7fe08557056548c228a8467b7e62fa9add Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 18:08:22 +0530
Subject: [PATCH 05/13] docs(v1): current-version benchmark results + evidence
 hygiene (WP9, WP1)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- docs/BENCHMARK_CURRENT.md: version- and commit-stamped reruns of the reproducible
  suites on current code — large-mutation (240 pairs, P=R=0.93, 11.6x/10.4x FP reduction),
  binding-lifecycle (808 pairs, selection recall 0.54->1.00, alarm precision/recall 1.00,
  0 errored), realworld (5 cases 2/1/2), and the new warrant-quality harness. The reruns
  MATCH the historical runs (same content-derived run_id), proving the V1 changes are
  additive and do not regress the benchmarks. Includes a what-this-does-NOT-prove block.
- HISTORICAL banners on docs/BENCHMARK_v0.7.0.md (v0.7.0) and
  docs/BENCHMARK_BINDING_LIFECYCLE.md (0.9.0), each pointing to BENCHMARK_CURRENT.md;
  the historical numbers are preserved verbatim.
- docs/V1_SCOPE.md: what V1 strengthening means and does NOT mean (no universal semantic
  correctness; trusted-base is a trust root not a sandbox; config binding is TOML/JSON only;
  code:/structural are Python-only; extractor stays draft; carried-forward limitations).
- README: trust-state legend (WARRANTED born -> TRUSTED/DEGRADED/REVOKED/UNKNOWN), historical
  labels on the benchmark citations, command-surface entries for the new C3 forms,
  checker-strength in bindings, config provenance in bind-suggest, checker-source base, and
  bench warrant-quality.
- tests/test_benchmark_evidence.py: wording guards (historical docs labeled; current doc
  version/commit-stamped with a non-overclaim block; README links current; V1_SCOPE boundary).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 README.md                           | 38 ++++++++++----
 V1_IMPLEMENTATION_TRACKER.md        |  2 +-
 docs/BENCHMARK_BINDING_LIFECYCLE.md |  6 +++
 docs/BENCHMARK_CURRENT.md           | 77 +++++++++++++++++++++++++++++
 docs/BENCHMARK_v0.7.0.md            |  5 ++
 docs/V1_SCOPE.md                    | 64 ++++++++++++++++++++++++
 tests/test_benchmark_evidence.py    | 61 +++++++++++++++++++++++
 7 files changed, 243 insertions(+), 10 deletions(-)
 create mode 100644 docs/BENCHMARK_CURRENT.md
 create mode 100644 docs/V1_SCOPE.md
 create mode 100644 tests/test_benchmark_evidence.py

diff --git a/README.md b/README.md
index ee88db8..06c59cf 100644
--- a/README.md
+++ b/README.md
@@ -103,6 +103,12 @@ fold      sha256:7920c71b5a6a9c8e WARRANTED -> REVOKED
 The summary still reads perfectly. Its portrait flipped to **REVOKED** — and every artifact whose
 warrant was built on it is flagged `recalled`, so nobody builds on a claim that silently went false.
 
+> **Trust states.** A warrant is born **WARRANTED**. Each `revalidate` folds it to **TRUSTED**
+> (all re-checked claims hold), **DEGRADED** or **REVOKED** (a claim broke — DEGRADED for a
+> non-load-bearing break, REVOKED for a load-bearing one), or **UNKNOWN** (a checker could not
+> run — ERROR is never silently green and never counted as broken). So `WARRANTED -> REVOKED`
+> above is the born state folding on its first revalidation.
+
 ## We ran this on dorian itself
 
 The `verify` and `revalidate` output above is exactly what dorian prints, shown for an illustrative
@@ -169,7 +175,10 @@ path-scope watcher (58 → 5 false alarms) and **10.4x** versus the stronger lin
 1.00 by construction here; the meaningful axis is their precision.)
 
 These numbers describe a synthetic fixture suite, not your repository, and are not a universal
-performance claim. See [`docs/BENCHMARK_v0.7.0.md`](docs/BENCHMARK_v0.7.0.md) (protocol:
+performance claim. The headline figures were **measured at v0.7.0** and are **historical**; the
+current version reproduces them unchanged (240 pairs, P=R=0.93) — see the version-stamped
+[`docs/BENCHMARK_CURRENT.md`](docs/BENCHMARK_CURRENT.md). See
+[`docs/BENCHMARK_v0.7.0.md`](docs/BENCHMARK_v0.7.0.md) (protocol:
 [`docs/BENCHMARK_PROTOCOL_v0.7.0.md`](docs/BENCHMARK_PROTOCOL_v0.7.0.md)); reproduce with
 `dorian bench large-mutation`, and measure your own repos with the harness in `bench/`.
 
@@ -207,6 +216,8 @@ trigger-vs-truth ceiling, on a real class (**partial**). Two further cases (docu
 sources, not reproduced) are honest misses (**not_solved**). These are scoped reproductions of public
 problem classes — not universal validation.
 
+The 808-pair figures above were **measured at dorian 0.9.0** and are **historical**; the
+current-version rerun (same protocol) is in [`docs/BENCHMARK_CURRENT.md`](docs/BENCHMARK_CURRENT.md).
 See [`docs/BENCHMARK_BINDING_LIFECYCLE.md`](docs/BENCHMARK_BINDING_LIFECYCLE.md) and
 [`docs/REALWORLD_USECASES.md`](docs/REALWORLD_USECASES.md) (protocols alongside each); reproduce with
 `dorian bench binding-lifecycle` and `dorian bench realworld-usecases`.
@@ -359,8 +370,10 @@ A warrant is worth only what its checkers actually catch. The full authoring con
 load-bearing claim, **bind** the file that would change if the claim went false, **prefer**
 shape-tolerant checks like `regex:`/`symbol:`/typed-C5 over brittle `string:`) — lives in
 [`docs/AGENT_CLAIMS.md`](docs/AGENT_CLAIMS.md). Checker program grammars (C1 span, C3
-path/symbol/string/regex, C4 `pytest:<nodeid>`, C5 typed data) are documented in
-[`spec/checkers.md`](spec/checkers.md).
+path/symbol/string/regex plus the V1 structural forms `py-signature:`/`py-const:` and the
+comment/docstring-stripped `code:`, C4 `pytest:<nodeid>`, C5 typed data) are documented in
+[`spec/checkers.md`](spec/checkers.md). What V1 strengthening does and does not promise is in
+[`docs/V1_SCOPE.md`](docs/V1_SCOPE.md).
 
 > **Checker programs are executable.** `dorian verify` *runs* every checker at seal time. C3 and typed
 > C5 only inspect files, but C4 (`pytest:`) and C5 `shell:` execute code — review an agent-emitted
@@ -386,12 +399,16 @@ claims.
   event: a flag only — downstream is never re-checked and its states are untouched. Re-seal with
   `seal --supersede <old-id>` so downstream warrants sealed against the old id stay reachable.
 - `dorian bindings <artifact>` — binding-quality diagnostics (unbacked, single-file, short-literal,
-  ambiguous-mention, trigger-only-symbol, unwatched-mention). Informational, never a gate; output
-  carries file paths only, never matched content. `ambiguous-mention` surfaces a load-bearing claim
-  whose symbol is defined in more than one file (so no definer is auto-watched); `trigger-only-symbol`
-  marks a watch added only as a re-check *trigger* that no checker actually exercises.
-- `dorian bind-suggest --claims claims.json` — read-only preview of the symbol-definer files `verify`
-  would auto-bind for each claim (and the ambiguous symbols it would skip). Writes nothing, never a gate.
+  ambiguous-mention, trigger-only-symbol, unwatched-mention) **plus per-claim checker-strength and
+  claim-risk** (it classifies each checker's *truth strength* and flags adequacy mismatches — a
+  `behavior` claim backed only by an existence checker, a vacuous pytest node). Informational, never a
+  gate; output carries file paths only, never matched content.
+- `dorian bind-suggest --claims claims.json` — read-only preview of the files `verify` would auto-bind
+  for each claim, **with provenance** (symbol-definer vs config-key), the ambiguous symbols/keys it
+  would skip, and any unparseable config file. Writes nothing, never a gate.
+- `dorian revalidate --checker-source base` (also Action `checker_trust: base`; default `head`) —
+  resolve each claim's checker spec from the `--since` base ref so a PR-added or PR-modified executable
+  checker is never executed (public/fork PRs). Fail-closed, **not a sandbox** — pair with `--deny-exec`.
 - `dorian rebind <artifact>` — re-derive a warrant's symbol-definer watches with the current binding
   logic and re-seal it (born-verifiable, superseding the old id), so a warrant sealed before the symbol
   index existed gains the wider watches. The watch only ever widens; a claim that has since become false
@@ -420,6 +437,9 @@ claims.
   benchmark for symbol binding ([`docs/BENCHMARK_BINDING_LIFECYCLE.md`](docs/BENCHMARK_BINDING_LIFECYCLE.md)).
   `dorian bench realworld-usecases` runs the offline public-case reproductions
   ([`docs/REALWORLD_USECASES.md`](docs/REALWORLD_USECASES.md)).
+- `dorian bench warrant-quality <artifact>` — offline per-claim mutation scoring: for each claim, does
+  its checker catch the drift it implies (caught / missed / brittle / ceiling)? Deterministic, never
+  mutates the real repo. Separates trigger from verdict; see [`docs/V1_SCOPE.md`](docs/V1_SCOPE.md).
 
 Exit codes: `0` ok/TRUSTED · `2` usage/infra (incl. a C1 or C5 `shell:` claim handed to `verify`) ·
 `3` DEGRADED · `4` REVOKED/integrity · `5` ERRORED-only (checkers could not run — never conflated with
diff --git a/V1_IMPLEMENTATION_TRACKER.md b/V1_IMPLEMENTATION_TRACKER.md
index d9bf12c..24bcfb7 100644
--- a/V1_IMPLEMENTATION_TRACKER.md
+++ b/V1_IMPLEMENTATION_TRACKER.md
@@ -99,7 +99,7 @@ Categories: IMPL=must-implement · TEST=must-test regression · DOC=must-documen
 
 | WP | Title | Status |
 |---|---|---|
-| WP1 | docs/evidence hygiene | TODO |
+| WP1 | docs/evidence hygiene | DONE (trust-state legend; historical banners on v0.7.0/0.9.0 benchmark docs; docs/V1_SCOPE.md; README command-surface + new-forms + historical labels; benchmark-evidence wording tests) |
 | WP2 | checker-strength / claim-risk linter | DONE (strength.py; surfaced in `bindings` + binding-gate warn; 19 tests) |
 | WP3 | Python structural checkers (py-signature, py-const) | DONE (pyast.py + C3 subgrammars; 27 tests incl. e2e) |
 | WP4 | semantic-context source search (`code:`) | DONE (pyast.code_only_python + C3 `code:`; 12 tests) |
diff --git a/docs/BENCHMARK_BINDING_LIFECYCLE.md b/docs/BENCHMARK_BINDING_LIFECYCLE.md
index 6a21eab..6c51b96 100644
--- a/docs/BENCHMARK_BINDING_LIFECYCLE.md
+++ b/docs/BENCHMARK_BINDING_LIFECYCLE.md
@@ -1,5 +1,11 @@
 # dorian binding-lifecycle benchmark
 
+> **HISTORICAL — measured at dorian 0.9.0** (see the run header below; the preserved 808-pair
+> full run). Evidence about the 0.9.0 implementation, not current behavior. The current-version
+> rerun (0.11.0, identical results — see [`BENCHMARK_CURRENT.md`](BENCHMARK_CURRENT.md)) confirms
+> the V1 changes did not regress it. NOTE: `dorian bench binding-lifecycle` REGENERATES this file;
+> restore it from git after a rerun so the historical record survives.
+
 > Generated from machine output by `bench.binding_lifecycle`. Known-truth labels,
 > in-fixture results — a reproducible demonstration of the MECHANISM on this suite,
 > not evidence about any real repository.
diff --git a/docs/BENCHMARK_CURRENT.md b/docs/BENCHMARK_CURRENT.md
new file mode 100644
index 0000000..daa5f5d
--- /dev/null
+++ b/docs/BENCHMARK_CURRENT.md
@@ -0,0 +1,77 @@
+# Current-version benchmark results
+
+Version-stamped reruns of dorian's reproducible benchmark suites on the **current** code,
+so the published numbers track the implementation rather than lagging behind it. The older
+result docs ([`BENCHMARK_v0.7.0.md`](BENCHMARK_v0.7.0.md) = v0.7.0,
+[`BENCHMARK_BINDING_LIFECYCLE.md`](BENCHMARK_BINDING_LIFECYCLE.md) = 0.9.0) are **historical**
+and are kept as-is for provenance.
+
+## Measurement environment
+
+| field | value |
+| --- | --- |
+| dorian version | `0.11.0` (V1 candidate) |
+| measured commit | `2a66a49eee7b8aa069d7fb9222572b272493856d` |
+| Python | 3.12.4 |
+| platform | darwin (CI matrix: 3.11 / 3.12 / 3.13) |
+| reproduce | `dorian bench large-mutation` · `dorian bench binding-lifecycle` · `dorian bench realworld-usecases` |
+
+## Results
+
+### Large controlled-mutation (240 pairs, 6 synthetic domains)
+
+```
+dorian: precision 0.93 / recall 0.93
+file-change watchers: recall 1.00 / precision 0.34 (naive), 0.56 (path-scope), 0.59 (line-aware)
+false-positive reduction: 11.6x vs path-scope (58 -> 5), 10.4x vs line-aware (52 -> 5)
+```
+
+**Identical to the v0.7.0 historical figures** — the V1 additions (structural checkers,
+semantic-context search, config-key binding, checker-strength diagnostics, trusted-base mode)
+are additive and do **not** regress this suite.
+
+### Binding-lifecycle (808 pairs, 63 synthetic domains, two mechanically-frozen labels)
+
+```
+selection (trigger) recall: checker_path_watcher 0.54 -> bound_dorian_candidate 1.00
+  (286 trigger-stale pairs re-checked that the pre-binding checker-path watcher silently skips)
+selection precision: bound_dorian_candidate 1.00 (vs 0.92 for the rejected "watch any file with the token")
+verdict (alarm) precision/recall: 1.00 / 1.00 (174/174 fact-stale pairs), 0 false BROKEN over all 808
+errored pairs: 0 (ERRORED is reported separately, never an alarm)
+gutted-body ceiling: existence checker fires the trigger but yields 0 BROKEN; only a C4 test catches it
+```
+
+**Identical to the 0.9.0 historical run** (same content-derived `run_id 168b50d9aa631d52`) — again
+confirming the V1 changes did not move the binding-lifecycle numbers.
+
+### Real-world public-case reproductions (5 cases, offline hermetic fixtures)
+
+```
+solved 2 · partial 1 · not_solved 2
+```
+
+Scoped reproductions of public problem *classes* (the public issue is the template; the fixture
+is invented), **not** broad real-world validation.
+
+### Warrant-quality harness (new in V1)
+
+`dorian bench warrant-quality <artifact>` scores, per claim and offline, whether the checker catches
+the drift it implies (caught / missed / brittle / ceiling), separating the trigger layer from the
+verdict layer and keeping ERROR distinct from a miss. It is an evidence generator about *a specific
+warrant*, not a repo-level metric; see [`V1_SCOPE.md`](V1_SCOPE.md).
+
+## What these results prove — and what they do not
+
+**Allowed (per [`VALIDATION_HONESTY.md`](VALIDATION_HONESTY.md)):**
+
+- the mechanism **reproduces** on the named synthetic suites at the stamped version and commit;
+- on those inputs, claim-level revalidation has far fewer false re-checks than a file watcher, and
+  binding's trigger recall is near-complete with zero false BROKEN;
+- the V1 changes did **not regress** the prior numbers (the reruns match the historical runs).
+
+**NOT supported:**
+
+- "works on real repos in general" / "validated" / "production-grade" — these are synthetic suites;
+- that the numbers transfer to your codebase;
+- that binding proves semantic behavior — it widens the re-check trigger; the checker decides truth
+  (the gutted-body ceiling is shown, not solved).
diff --git a/docs/BENCHMARK_v0.7.0.md b/docs/BENCHMARK_v0.7.0.md
index 5789f0f..da0be6c 100644
--- a/docs/BENCHMARK_v0.7.0.md
+++ b/docs/BENCHMARK_v0.7.0.md
@@ -1,5 +1,10 @@
 # dorian large controlled-mutation benchmark (v0.7.0)
 
+> **HISTORICAL — measured at v0.7.0.** These numbers are evidence about the v0.7.0
+> implementation, not current behavior. For the current-version rerun (same protocol,
+> stamped with the measured commit) see [`BENCHMARK_CURRENT.md`](BENCHMARK_CURRENT.md).
+> Reproduce this suite at any version with `dorian bench large-mutation`.
+
 Numbers only. Labels are **known-truth**: each mutation's stale / not-stale
 outcome for a claim is a mechanical consequence of the edit (e.g. changing
 `TIMEOUT = 30` to `10` falsifies the claim "the default timeout is 30 seconds").
diff --git a/docs/V1_SCOPE.md b/docs/V1_SCOPE.md
new file mode 100644
index 0000000..29504c8
--- /dev/null
+++ b/docs/V1_SCOPE.md
@@ -0,0 +1,64 @@
+# What V1 means — and what it does not
+
+dorian's V1 strengthening is **deterministic strengthening on supported domains**, not a
+promise of universal correctness. This page states the boundary so no feature or
+benchmark can be read as more than it is. It is the companion to
+[`VALIDATION_HONESTY.md`](VALIDATION_HONESTY.md) (evidence wording) and
+[`SECURITY_BOUNDARY.md`](SECURITY_BOUNDARY.md) (execution/trust).
+
+## What V1 adds
+
+All additive and backward-compatible; default behavior is unchanged unless you opt in.
+
+- **Python structural checkers** — `py-signature:` and `py-const:` (C3 subgrammars) compare
+  parsed AST structure and literal **values**, closing the `symbol:` existence ceiling and the
+  `string:`/`regex:` comment-survival false-pass for Python signatures and constants.
+- **Semantic-context search** — `code:` runs a regex over comment/docstring-stripped Python,
+  so a fact surviving only in a comment or docstring FAILs while the same fact in real code
+  passes. (`spec/checkers.md`.)
+- **Checker-strength / claim-risk diagnostics** — `dorian bindings` and the `--binding-gate`
+  warn output now classify each checker's *truth strength* and flag kind-vs-strength
+  adequacy mismatches (a `behavior` claim backed only by an existence checker; a vacuous
+  pytest node). Advisory; it never changes a verdict, trust state, or exit code.
+- **Multi-index binding** — config keys in tracked `.toml`/`.json` files now widen a claim's
+  re-check trigger set (with provenance in `bind-suggest`). Conservative and trigger-only.
+- **Trusted-base checker-source mode** — `revalidate --checker-source base` / Action
+  `checker_trust: base` runs only base-approved checker specs, for public/fork PRs.
+- **Warrant-quality harness** — `dorian bench warrant-quality` scores per-claim whether a
+  checker catches the drift it implies, offline and deterministically.
+
+## What V1 does NOT mean
+
+- **Not universal semantic correctness.** dorian verifies *stated claims against the source*
+  with deterministic checkers. It cannot prove arbitrary prose, runtime behavior without a
+  test, external-system state, or anything outside a supported checker/binding domain.
+- **The trigger-vs-truth ceiling is real and visible, not removed.** Binding decides WHEN a
+  claim is re-checked; the checker decides WHETHER it is false. A `symbol:`/`py-signature:`
+  checker is blind to a body-only ("gutted body") change — only a `pytest:` test catches that.
+  The checker-strength diagnostics and the warrant-quality harness *surface* this; they do not
+  eliminate it.
+- **No public-fork safety beyond the trust root.** `checker_trust: base` stops PR-authored
+  executable checkers from running, but a base-approved `pytest:` checker can still execute
+  PR-head code. It is a checker-source trust root, **not a sandbox**; for untrusted forks
+  combine it with `deny_exec: true` (or external isolation). `--deny-exec`/`--deny-shell` are
+  fail-closed, not sandboxes.
+- **Config binding is TOML/JSON only.** YAML is not indexed — parsing it needs a third-party
+  dependency and dorian's core has zero runtime deps. An unparseable supported config file is
+  surfaced as a diagnostic, never silently skipped, but a key dorian cannot index is an honest
+  miss, not a guarantee.
+- **`code:`/structural forms are Python-only.** Other languages still rely on raw `string:`/
+  `regex:` text search, which retains the comment/docstring survival class.
+- **The LLM extractor stays draft/experimental.** V1 does not promote `--extract`; emit claims
+  directly (`docs/AGENT_CLAIMS.md`).
+- **Benchmarks prove reproducibility on named inputs**, never "works on real repos" — see
+  `VALIDATION_HONESTY.md`. Historical result docs (v0.7.0, 0.9.0) are labeled historical;
+  current-version numbers live in `BENCHMARK_CURRENT.md`.
+
+## Known limitations carried into V1 (documented, not fixed)
+
+- **Audit/state atomicity** — a claim/trust-state change and its audit event commit in
+  separate transactions; a crash between the two can leave the event missing (`fold.py`).
+- **Ambiguous bindings are skipped, not resolved** — a symbol or config key defined in more
+  than one file is left unwatched and surfaced for manual binding, never guessed.
+- **ERROR is never BROKEN** — a checker that cannot run (bad program, missing engine, blocked
+  by policy, unresolved base sidecar) is ERRORED, never a staleness verdict, end to end.
diff --git a/tests/test_benchmark_evidence.py b/tests/test_benchmark_evidence.py
new file mode 100644
index 0000000..24aabcc
--- /dev/null
+++ b/tests/test_benchmark_evidence.py
@@ -0,0 +1,61 @@
+"""Benchmark-evidence hygiene (WP1/WP9): historical vs current, honestly labeled.
+
+The repo's published numbers must not silently imply that older results describe the
+current implementation. These pin: historical result docs carry a HISTORICAL banner; the
+current-version results doc is version- and commit-stamped and carries a what-it-does-NOT-
+prove block; the README labels the older numbers historical and points to the current doc;
+and the V1-scope doc states the boundary. Wording tests, no network.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+ROOT = Path(__file__).resolve().parent.parent
+
+
+def _read(rel: str) -> str:
+    return (ROOT / rel).read_text(encoding="utf-8")
+
+
+def test_historical_benchmark_docs_are_labeled_historical() -> None:
+    for rel in ("docs/BENCHMARK_v0.7.0.md", "docs/BENCHMARK_BINDING_LIFECYCLE.md"):
+        doc = _read(rel)
+        assert "HISTORICAL" in doc, f"{rel} must be labeled HISTORICAL"
+        assert "BENCHMARK_CURRENT.md" in doc, f"{rel} must point to the current-version doc"
+
+
+def test_current_benchmark_doc_is_version_and_commit_stamped() -> None:
+    doc = _read("docs/BENCHMARK_CURRENT.md")
+    assert "0.11.0" in doc  # dorian version stamp
+    assert "measured commit" in doc.lower()
+    assert "Python" in doc  # environment summary
+    assert "reproduce" in doc.lower()
+    # the mandatory non-overclaim block
+    low = doc.lower()
+    assert "not supported" in low or "do not" in low
+    assert "synthetic" in low
+    assert "binding proves" in low or "binding" in low  # trigger-vs-truth caveat present
+
+
+def test_current_doc_does_not_claim_real_world_validation() -> None:
+    doc = _read("docs/BENCHMARK_CURRENT.md").lower()
+    # the doc may NEGATE these phrases, but must never assert them
+    assert "works on real repos in general" not in doc.replace(
+        '"works on real repos in general"', ""
+    )
+
+
+def test_readme_labels_benchmarks_historical_and_links_current() -> None:
+    readme = _read("README.md")
+    assert "historical" in readme.lower()
+    assert "docs/BENCHMARK_CURRENT.md" in readme
+
+
+def test_v1_scope_doc_states_the_boundary() -> None:
+    doc = _read("docs/V1_SCOPE.md")
+    low = doc.lower()
+    assert "not universal semantic correctness" in low
+    assert "not a sandbox" in low
+    assert "gutted body" in low or "gutted-body" in low  # the ceiling is named
+    assert "yaml" in low  # the config-binding boundary is stated

From 2a4befaff04e5a4551284f2b91ffa00b0c7d9d6c Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 18:19:36 +0530
Subject: [PATCH 06/13] fix(v1): keep BENCHMARK_v0.7.0.md byte-identical to its
 generator

test_large_mutation::test_committed_doc_matches_render asserts docs/BENCHMARK_v0.7.0.md
== lm.render_markdown(summary), so the generated doc cannot carry a hand-added banner.
Drop the HISTORICAL banner from it (the title already version-stamps it "(v0.7.0)"); its
historical status is conveyed by README + BENCHMARK_CURRENT.md (which names it as the
historical source). The binding-lifecycle doc has no byte-match guard, so it keeps its
banner. Updated test_benchmark_evidence to match: binding-lifecycle by banner, v0.7.0 by
version-stamped title + the current doc's cross-reference.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/BENCHMARK_v0.7.0.md         |  5 -----
 tests/test_benchmark_evidence.py | 16 ++++++++++++----
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/docs/BENCHMARK_v0.7.0.md b/docs/BENCHMARK_v0.7.0.md
index da0be6c..5789f0f 100644
--- a/docs/BENCHMARK_v0.7.0.md
+++ b/docs/BENCHMARK_v0.7.0.md
@@ -1,10 +1,5 @@
 # dorian large controlled-mutation benchmark (v0.7.0)
 
-> **HISTORICAL — measured at v0.7.0.** These numbers are evidence about the v0.7.0
-> implementation, not current behavior. For the current-version rerun (same protocol,
-> stamped with the measured commit) see [`BENCHMARK_CURRENT.md`](BENCHMARK_CURRENT.md).
-> Reproduce this suite at any version with `dorian bench large-mutation`.
-
 Numbers only. Labels are **known-truth**: each mutation's stale / not-stale
 outcome for a claim is a mechanical consequence of the edit (e.g. changing
 `TIMEOUT = 30` to `10` falsifies the claim "the default timeout is 30 seconds").
diff --git a/tests/test_benchmark_evidence.py b/tests/test_benchmark_evidence.py
index 24aabcc..af4eee6 100644
--- a/tests/test_benchmark_evidence.py
+++ b/tests/test_benchmark_evidence.py
@@ -19,10 +19,18 @@ def _read(rel: str) -> str:
 
 
 def test_historical_benchmark_docs_are_labeled_historical() -> None:
-    for rel in ("docs/BENCHMARK_v0.7.0.md", "docs/BENCHMARK_BINDING_LIFECYCLE.md"):
-        doc = _read(rel)
-        assert "HISTORICAL" in doc, f"{rel} must be labeled HISTORICAL"
-        assert "BENCHMARK_CURRENT.md" in doc, f"{rel} must point to the current-version doc"
+    # the binding-lifecycle doc is NOT byte-matched to its generator, so it carries an
+    # explicit HISTORICAL banner pointing to the current-version doc.
+    bl = _read("docs/BENCHMARK_BINDING_LIFECYCLE.md")
+    assert "HISTORICAL" in bl, "binding-lifecycle doc must be labeled HISTORICAL"
+    assert "BENCHMARK_CURRENT.md" in bl
+    # the large-mutation doc IS byte-matched to its generator (test_large_mutation), so it
+    # cannot carry a hand banner; its historical status is its version-stamped title plus
+    # the current-results doc naming it as the historical source.
+    v07 = _read("docs/BENCHMARK_v0.7.0.md")
+    assert "(v0.7.0)" in v07, "the large-mutation doc title must carry its version stamp"
+    cur = _read("docs/BENCHMARK_CURRENT.md")
+    assert "BENCHMARK_v0.7.0.md" in cur and "historical" in cur.lower()
 
 
 def test_current_benchmark_doc_is_version_and_commit_stamped() -> None:

From a6595baaf02f285ea21472caa48b85c5eeddf3a0 Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 18:31:59 +0530
Subject: [PATCH 07/13] fix(v1): resolve all 6 adversarial-review BLOCK
 findings + 2 hygiene items
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A 5-lens adversarial review (BLOCK verdict) reproduced 6 real defects; all fixed
red-green, plus two hygiene items that contradicted stated invariants:

1. Config-key over-binding broke "default unchanged unless opt-in": a claim backticking a
   common config word (e.g. `dependencies`) bound pyproject.toml and could newly refuse a
   clean `verify` with exit 6. Fix: _CONFIG_KEY_STOPWORDS (PEP 621 / common keys) on the
   config axis; specific keys (max_workers) still bind. Regression test reproduces the exit-6.
2/3. SECURITY.md + action/README.md still said trusted-base was "not yet implemented" —
   false on this branch. Updated both to describe checker_trust: base as shipped (with the
   non-sandbox residual); added a guard test so the drift can't recur.
4. `code:` false PASS on an f-string docstring — code_only_python now recognises
   ast.JoinedStr docstrings.
5. `code:` false FAIL on a real string co-located on a docstring's line — docstrings are
   now blanked by AST node SPAN, not whole line.
6. `py-const` PASSed on value-TYPE drift (30/30.0, 1/True, 0/False) via Python == — now
   requires matching type before ==. Documented + red-green tested.

Hygiene: warrant-quality _run_mutated refuses a `../`-escaping file operand (its docstring
promised containment); check_signature wraps comparison in _PARSE_ERRORS so a pathological
signature ERRORs within pyast. Added an end-to-end ERROR-never-BROKEN test for the new C3
forms (non-literal RHS -> ERRORED, exit 5, never BROKEN).

658 non-slow tests pass; lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 SECURITY.md                            | 23 ++++++---
 action/README.md                       | 29 ++++++-----
 bench/warrant_quality.py               | 16 ++++--
 src/dorian/pyast.py                    | 71 ++++++++++++++++----------
 src/dorian/symbol_index.py             | 66 +++++++++++++++++++++++-
 tests/test_action_security_defaults.py | 13 +++++
 tests/test_config_binding.py           | 50 ++++++++++++++++++
 tests/test_pystructural.py             | 59 +++++++++++++++++++++
 tests/test_semantic_context.py         | 18 +++++++
 9 files changed, 290 insertions(+), 55 deletions(-)

diff --git a/SECURITY.md b/SECURITY.md
index c746999..b1d973b 100644
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -64,14 +64,21 @@ inside your own sandbox (container, restricted user, no secrets in env).
 
 ## Public fork PR CI
 
-dorian does **not** currently advertise a safe public-fork-PR mode. A trusted-base
-Action design (run checkers from the base ref, never from untrusted head, deny-exec
-by default for forks) is documented in
-[docs/TRUSTED_BASE_ACTION_DESIGN.md](docs/TRUSTED_BASE_ACTION_DESIGN.md) but is **not
-yet implemented or tested**. Until it is, the safe answer for public forks is
-`--deny-exec` plus the standard caution that any executed checker still runs with
-the runner's privileges. Do not wire dorian into `pull_request_target` with a
-checkout of untrusted head.
+For public/forked-PR CI, use **trusted-base checker-source mode**:
+`dorian revalidate --checker-source base` (Action input `checker_trust: base`). It
+resolves each claim's checker SPEC from the trusted base ref and runs it against the
+PR-head sources, so a PR-added or PR-modified executable checker is never executed and
+a PR rewriting a checker spec cannot self-attest a verdict (the base-approved spec
+wins). A missing or tampered base sidecar **fails closed** (ERRORED, never executed).
+This is implemented and proven by the test matrix in
+[docs/TRUSTED_BASE_ACTION_DESIGN.md](docs/TRUSTED_BASE_ACTION_DESIGN.md) §6
+(`tests/test_trusted_base.py`).
+
+It is a **checker-source trust root, not a sandbox**: a base-approved `pytest:` checker
+can still import and execute PR-head code. So for fully untrusted forks, combine
+`checker_trust: base` **with `deny_exec: true`** (or external isolation) — any executed
+checker still runs with the runner's privileges. Do not wire dorian into
+`pull_request_target` with a checkout of untrusted head.
 
 ## Supported versions
 
diff --git a/action/README.md b/action/README.md
index 232cc58..558c19e 100644
--- a/action/README.md
+++ b/action/README.md
@@ -59,20 +59,21 @@ caveats:
 1. A `.warrant` file is a **non-obvious executable input**. Reviewers who
    would scrutinize a workflow or `conftest.py` change may wave through a
    "docs-only" diff that swaps a checker `program`.
-2. The verdict is **self-attested by the PR tree**. A PR can rewrite a
-   sidecar so a broken claim re-verifies; the trust root for what "should"
-   be checked is not yet the base branch.
-
-**deny-exec input (partial mitigation, available now).** Set `deny_exec: true`
-(or `deny_shell: true`) on the Action to refuse the executable checker families
-during revalidation: C4 pytest and C5 shell ERROR instead of executing, so a
-PR-authored sidecar cannot make this Action run its code. It flows through the
-`DORIAN_DENY_EXEC` env fallback; the default `false` preserves today's behavior
-for trusted/internal repos. This is fail-closed but **not a sandbox** and **not
-yet a full public-fork story**: it removes code execution but does not address
-the self-attested-verdict problem (a PR can still rewrite a *non-executable* C3
-claim so a broken fact re-verifies). See `SECURITY.md` and
-`docs/SECURITY_BOUNDARY.md`.
+2. In the default `head` mode the verdict is **self-attested by the PR tree** — a
+   PR can rewrite a sidecar so a broken claim re-verifies. **`checker_trust: base`
+   fixes exactly this** (see below): it sources every checker spec from the base
+   ref, so a PR rewriting a spec can no longer weaken the verdict. Use `head` only
+   for trusted/internal repos.
+
+**deny-exec input.** Set `deny_exec: true` (or `deny_shell: true`) on the Action to
+refuse the executable checker families during revalidation: C4 pytest and C5 shell
+ERROR instead of executing, so a PR-authored sidecar cannot make this Action run its
+code. It flows through the `DORIAN_DENY_EXEC` env fallback; the default `false`
+preserves today's behavior for trusted/internal repos. This is fail-closed but **not
+a sandbox**: on its own it removes code execution but does not address the
+self-attested-verdict problem for *non-executable* checkers — that is what
+`checker_trust: base` adds, and the two compose (use both for untrusted forks). See
+`SECURITY.md` and `docs/SECURITY_BOUNDARY.md`.
 
 ```yaml
 # untrusted / public-fork posture
diff --git a/bench/warrant_quality.py b/bench/warrant_quality.py
index 3b0862d..e7ab4de 100644
--- a/bench/warrant_quality.py
+++ b/bench/warrant_quality.py
@@ -96,11 +96,19 @@ def _mutations(spec: CheckerSpec) -> Iterator[tuple[str, str, str, object]]:
 
 def _run_mutated(repo: Path, claim: Claim, spec_index: int, file: str, mutate, policy) -> Verdict:
     """Run one checker against a throwaway copy of `file` with `mutate` applied. Only the
-    one file the checker reads is materialized — the real repo is never touched."""
-    original = (repo / file).read_text(encoding="utf-8", errors="replace")
+    one file the checker reads is materialized — the real repo is never touched, and a
+    warrant-controlled `file` operand that escapes the repo (e.g. `../`) is refused so the
+    harness cannot read or write outside its sandbox (the checker would ERROR on it anyway)."""
+    repo = repo.resolve()
+    src = (repo / file).resolve()
+    if not src.is_relative_to(repo) or not src.is_file():
+        return Verdict.ERROR  # path escape or missing: do not read/write outside the repo
+    original = src.read_text(encoding="utf-8", errors="replace")
     with tempfile.TemporaryDirectory() as td:
-        work = Path(td)
-        target = work / file
+        work = Path(td).resolve()
+        target = (work / file).resolve()
+        if not target.is_relative_to(work):
+            return Verdict.ERROR  # write would escape the temp sandbox
         target.parent.mkdir(parents=True, exist_ok=True)
         target.write_text(mutate(original), encoding="utf-8")
         ctx = CheckContext(repo=work, claim=claim, policy=policy)
diff --git a/src/dorian/pyast.py b/src/dorian/pyast.py
index fb2c4b0..6a7ab4e 100644
--- a/src/dorian/pyast.py
+++ b/src/dorian/pyast.py
@@ -43,23 +43,10 @@ def code_only_python(text: str) -> str | None:
     tree = _parse(text)
     if tree is None:
         return None
-    doc_start_lines: set[int] = set()
-    for node in ast.walk(tree):
-        if isinstance(node, _SCOPE_NODES):
-            body = getattr(node, "body", None)
-            if (
-                isinstance(body, list)
-                and body
-                and isinstance(body[0], ast.Expr)
-                and isinstance(body[0].value, ast.Constant)
-                and isinstance(body[0].value.value, str)
-            ):
-                doc_start_lines.add(body[0].value.lineno)
-
     buf = [list(line) for line in text.split("\n")]
 
-    def blank(start: tuple[int, int], end: tuple[int, int]) -> None:
-        (sl, sc), (el, ec) = start, end
+    def blank(sl: int, sc: int, el: int, ec: int) -> None:
+        """Blank the half-open span (sl,sc)..(el,ec) to spaces. Lines 1-based, cols 0-based."""
         for ln in range(sl, el + 1):
             if ln - 1 >= len(buf):
                 break
@@ -69,12 +56,28 @@ def blank(start: tuple[int, int], end: tuple[int, int]) -> None:
             for i in range(lo, min(hi, len(row))):
                 row[i] = " "
 
+    # Docstrings: the first body statement of a module/class/function that is a bare string
+    # OR f-string expression. Blank by the NODE's span (not the whole line), so a real string
+    # literal co-located on the docstring's physical line is preserved; an f-string docstring
+    # (ast.JoinedStr) is blanked just like a plain one.
+    for node in ast.walk(tree):
+        if not isinstance(node, _SCOPE_NODES):
+            continue
+        body = getattr(node, "body", None)
+        if not (isinstance(body, list) and body and isinstance(body[0], ast.Expr)):
+            continue
+        val = body[0].value
+        is_doc = (isinstance(val, ast.Constant) and isinstance(val.value, str)) or isinstance(
+            val, ast.JoinedStr
+        )
+        if is_doc and val.end_lineno is not None and val.end_col_offset is not None:
+            blank(val.lineno, val.col_offset, val.end_lineno, val.end_col_offset)
+
+    # Comments: char-accurate via tokenize.
     try:
         for tok in tokenize.generate_tokens(io.StringIO(text).readline):
-            if tok.type == tokenize.COMMENT or (
-                tok.type == tokenize.STRING and tok.start[0] in doc_start_lines
-            ):
-                blank(tok.start, tok.end)
+            if tok.type == tokenize.COMMENT:
+                blank(tok.start[0], tok.start[1], tok.end[0], tok.end[1])
     except (tokenize.TokenError, IndentationError, SyntaxError):
         pass  # ast parsed cleanly; a tokenizer hiccup leaves best-effort blanking
     return "\n".join("".join(row) for row in buf)
@@ -251,14 +254,23 @@ def check_signature(text: str, needle: str) -> tuple[str, str]:
 
     if async_required and not isinstance(fn, ast.AsyncFunctionDef):
         return ("FAIL", f"signature_mismatch: {qualname} is not async")
-    mismatch = _compare_params(_params(pfn), _params(fn))
-    if mismatch:
-        return ("FAIL", f"signature_mismatch: {qualname}: {mismatch}")
-    if arrow:
-        want_ret = ast.unparse(pfn.returns) if pfn.returns else None
-        got_ret = ast.unparse(fn.returns) if fn.returns else None
-        if want_ret != got_ret:
-            return ("FAIL", f"signature_mismatch: {qualname}: return {got_ret!r} != {want_ret!r}")
+    # normalization/comparison can hit RecursionError/MemoryError on a pathological
+    # (but parseable) signature, e.g. a deeply nested annotation — honor pyast's own
+    # ERROR contract here rather than relying on the run_checker safety net.
+    try:
+        mismatch = _compare_params(_params(pfn), _params(fn))
+        if mismatch:
+            return ("FAIL", f"signature_mismatch: {qualname}: {mismatch}")
+        if arrow:
+            want_ret = ast.unparse(pfn.returns) if pfn.returns else None
+            got_ret = ast.unparse(fn.returns) if fn.returns else None
+            if want_ret != got_ret:
+                return (
+                    "FAIL",
+                    f"signature_mismatch: {qualname}: return {got_ret!r} != {want_ret!r}",
+                )
+    except _PARSE_ERRORS:
+        return ("ERROR", f"signature_uncomparable: {qualname} has a pathological signature")
     return ("PASS", f"signature ok: {qualname}")
 
 
@@ -288,6 +300,9 @@ def check_const(text: str, needle: str) -> tuple[str, str]:
         got = ast.literal_eval(rhs)
     except _PARSE_ERRORS:
         return ("ERROR", f"non_literal: {qualname} is not a literal constant")
-    if got == want:
+    # value AND type must match: Python `==` conflates 30/30.0, 1/True, 0/False, so a
+    # type-only drift (a bool flag becoming an int, an int becoming a float) would wrongly
+    # PASS the tier sold as the strong value verifier. Compare type first.
+    if type(got) is type(want) and got == want:
         return ("PASS", f"const ok: {qualname} == {expected}")
     return ("FAIL", f"const_value_mismatch: {qualname} != {expected}")
diff --git a/src/dorian/symbol_index.py b/src/dorian/symbol_index.py
index f35aff4..ad2ba19 100644
--- a/src/dorian/symbol_index.py
+++ b/src/dorian/symbol_index.py
@@ -42,6 +42,64 @@ class from docs/NEXT_ALGORITHMIC_BETS.md #1 — where a claim about a symbol
 _CONFIG_SUFFIXES = (".toml", ".json")
 _MIN_KEY_LEN = 4  # mirror bindings._MIN_IDENT: shorter keys are noise
 
+# Common PEP 621 / packaging / generic config keys are ordinary English words that appear
+# in claim prose constantly. Binding the config file every time one is mentioned is noise —
+# and worse, it can pull a restricted config file (e.g. pyproject.toml) into the scope-linted
+# read-set and newly refuse a previously-clean seal. So the config axis (like the symbol
+# axis's _BACKTICK_STOPWORDS) skips these common keys; specific keys (max_workers, new_login)
+# still bind. Found by adversarial review: a backticked `dependencies` made verify exit 6.
+_CONFIG_KEY_STOPWORDS = frozenset(
+    {
+        "name",
+        "version",
+        "description",
+        "readme",
+        "license",
+        "authors",
+        "maintainers",
+        "keywords",
+        "classifiers",
+        "dependencies",
+        "scripts",
+        "urls",
+        "homepage",
+        "repository",
+        "documentation",
+        "changelog",
+        "requires",
+        "optional",
+        "project",
+        "build",
+        "tool",
+        "include",
+        "exclude",
+        "packages",
+        "source",
+        "target",
+        "default",
+        "type",
+        "format",
+        "title",
+        "summary",
+        "value",
+        "values",
+        "enabled",
+        "disabled",
+        "options",
+        "settings",
+        "config",
+        "email",
+        "data",
+        "files",
+        "module",
+        "modules",
+        "dependency",
+        "group",
+        "groups",
+        "extras",
+    }
+)
+
 
 def python_symbol_definers(repo: Path) -> dict[str, tuple[str, ...]]:
     """Symbol name -> the sorted, unique git-tracked `.py` files that define it
@@ -221,6 +279,8 @@ def claim_config_watch_paths(repo: Path, claims: list[Claim]) -> dict[str, tuple
     for claim in claims:
         paths: set[str] = set()
         for token in claim_tokens.get(claim.id, ()):
+            if token.lower() in _CONFIG_KEY_STOPWORDS:
+                continue  # common config word: prose, not a key to bind (over-binding/scope)
             files = index.get(token)
             if files is not None and len(files) == 1:
                 paths.add(files[0])
@@ -252,7 +312,11 @@ def ambiguous_config_mentions(
     for claim in claims:
         if not claim.load_bearing or not isinstance(claim.text, str):
             continue
-        ambiguous = {tok: index[tok] for tok in _tokens(claim.text) if len(index.get(tok, ())) > 1}
+        ambiguous = {
+            tok: index[tok]
+            for tok in _tokens(claim.text)
+            if tok.lower() not in _CONFIG_KEY_STOPWORDS and len(index.get(tok, ())) > 1
+        }
         if ambiguous:
             out[claim.id] = ambiguous
     return out
diff --git a/tests/test_action_security_defaults.py b/tests/test_action_security_defaults.py
index 2bee860..f369aff 100644
--- a/tests/test_action_security_defaults.py
+++ b/tests/test_action_security_defaults.py
@@ -61,3 +61,16 @@ def test_security_docs_state_public_fork_limitation() -> None:
         assert "--deny-exec" in doc
         assert "not a sandbox" in low
         assert "fork" in low  # public-fork posture is addressed explicitly
+
+
+def test_security_docs_reflect_trusted_base_as_implemented() -> None:
+    """Regression (adversarial review): trusted-base SHIPPED, so the docs users are routed
+    to must not still say it is unimplemented, and must name the actual surface."""
+    sec = (REPO_ROOT / "SECURITY.md").read_text(encoding="utf-8")
+    action_readme = ACTION_README.read_text(encoding="utf-8")
+    for doc in (sec, action_readme):
+        low = doc.lower()
+        assert "not yet implemented" not in low
+        assert "not yet a full public-fork story" not in low
+        # the actual feature is named (Action input and/or CLI flag)
+        assert "checker_trust" in doc or "checker-source" in doc
diff --git a/tests/test_config_binding.py b/tests/test_config_binding.py
index b97e010..7831b88 100644
--- a/tests/test_config_binding.py
+++ b/tests/test_config_binding.py
@@ -144,6 +144,56 @@ def test_verify_binds_config_and_revalidate_rechecks(tmp_path: Path) -> None:
     assert cli.main(["--repo", str(repo), "revalidate", "--since", base]) == cli.EXIT_REVOKED
 
 
+def test_common_config_key_does_not_bind(tmp_path: Path) -> None:
+    """A common PEP 621 / config word (dependencies, version, name, ...) is English prose,
+    not a specific key to bind — it must NOT auto-watch the config file (over-binding noise,
+    and it can pull a restricted config file into the scope-linted read-set)."""
+    repo = _repo(tmp_path)
+    write(repo, "pyproject.toml", '[project]\nname = "x"\nversion = "0"\ndependencies = []\n')
+    commit_all(repo, "pyproject")
+    claims = [_claim("c", "Updated the `dependencies` and `version`.", "path:pyproject.toml")]
+    assert symbol_index.claim_config_watch_paths(repo, claims) == {}
+    assert symbol_index.ambiguous_config_mentions(repo, claims) == {}
+
+
+def test_specific_config_key_still_binds(tmp_path: Path) -> None:
+    repo = _repo(tmp_path)
+    write(repo, "settings.toml", "[server]\nmax_workers = 4\n")
+    commit_all(repo, "cfg")
+    claims = [_claim("c", "`max_workers` is 4.", "path:settings.toml")]
+    assert symbol_index.claim_config_watch_paths(repo, claims) == {"c": ("settings.toml",)}
+
+
+def test_common_config_word_does_not_newly_refuse_a_clean_seal(tmp_path: Path) -> None:
+    """Backward-compat regression (found by adversarial review): on a repo whose pyproject is
+    under a restricted scope glob, a claim merely backticking `dependencies` must NOT newly
+    refuse `verify` with exit 6 — the checker names note.md, not pyproject.toml."""
+    repo = _repo(tmp_path)
+    write(
+        repo,
+        "pyproject.toml",
+        '[project]\nname = "x"\nversion = "0"\ndependencies = []\n\n'
+        '[tool.dorian.scopes]\nrestricted = ["pyproject.toml"]\n',
+    )
+    write(repo, "note.md", "# n\n\nUpdated the dependencies list.\n")
+    commit_all(repo, "restricted pyproject")
+    claims = {
+        "claims": [
+            {
+                "id": "c",
+                "text": "Updated the `dependencies` list.",
+                "kind": "reference",
+                "load_bearing": False,
+                "checkers": [{"type": "C3", "program": "path:note.md"}],
+            }
+        ]
+    }
+    cp = repo / "claims.json"
+    cp.write_text(json.dumps(claims), encoding="utf-8")
+    rc = cli.main(["--repo", str(repo), "verify", "note.md", "--claims", str(cp)])
+    assert rc == 0  # `dependencies` must not pull restricted pyproject.toml into the read-set
+
+
 def test_bind_suggest_shows_config_provenance(tmp_path: Path, capsys) -> None:
     repo = _repo(tmp_path)
     write(repo, "settings.toml", "[server]\nmax_workers = 4\n")
diff --git a/tests/test_pystructural.py b/tests/test_pystructural.py
index 76976e4..6fd94fe 100644
--- a/tests/test_pystructural.py
+++ b/tests/test_pystructural.py
@@ -232,6 +232,21 @@ def test_py_const_comment_and_docstring_survival_does_not_pass(tmp_path: Path) -
     assert _run(tmp_path, "py-const:c.py::TIMEOUT::30").verdict is Verdict.FAIL
 
 
+def test_py_const_rejects_value_type_drift(tmp_path: Path) -> None:
+    """The 'strong value verifier' must not let a value's TYPE drift past on Python ==:
+    30 != 30.0, 1 != True, 0 != False. Otherwise a bool flag silently becoming an int, or
+    an int timeout becoming a float, would re-verify green."""
+    _w(tmp_path, "c.py", "TIMEOUT = 30.0\nFLAG = 1\nZERO = 0\n")
+    assert _run(tmp_path, "py-const:c.py::TIMEOUT::30").verdict is Verdict.FAIL  # int vs float
+    assert _run(tmp_path, "py-const:c.py::FLAG::True").verdict is Verdict.FAIL  # int vs bool
+    assert _run(tmp_path, "py-const:c.py::ZERO::False").verdict is Verdict.FAIL  # int vs bool
+    # same value AND type still passes
+    _w(tmp_path, "c.py", "TIMEOUT = 30\nRATE = 0.5\nFLAG = True\n")
+    assert _run(tmp_path, "py-const:c.py::TIMEOUT::30").verdict is Verdict.PASS
+    assert _run(tmp_path, "py-const:c.py::RATE::0.5").verdict is Verdict.PASS
+    assert _run(tmp_path, "py-const:c.py::FLAG::True").verdict is Verdict.PASS
+
+
 # --- end-to-end: the new forms bind, seal born-verifiable, and re-check ---------
 
 
@@ -288,3 +303,47 @@ def test_structural_forms_verify_seal_and_revalidate(fixture_repo: Path) -> None
     apply_three_change_commit(fixture_repo)
     rc = cli.main(["--repo", str(fixture_repo), "revalidate", "--since", base])
     assert rc == cli.EXIT_REVOKED  # a load-bearing claim broke -> exit 4
+
+
+def test_new_form_error_folds_to_errored_not_broken(fixture_repo: Path) -> None:
+    """A py-const claim whose RHS becomes NON-LITERAL on a later edit ERRORs (the value
+    cannot be determined) — it must fold to ERRORED (exit 5), never BROKEN. Pins the
+    ERROR-never-BROKEN invariant end-to-end for the new C3 forms."""
+    import json
+
+    from conftest import commit_all, git, write
+    from dorian import cli
+    from dorian.revalidate import revalidate
+
+    claims = {
+        "claims": [
+            {
+                "id": "timeout",
+                "text": "The default request timeout is 30 seconds.",
+                "kind": "quantity",
+                "load_bearing": True,
+                "checkers": [{"type": "C3", "program": "py-const:src/config.py::TIMEOUT::30"}],
+            }
+        ]
+    }
+    (fixture_repo / "claims.json").write_text(json.dumps(claims), encoding="utf-8")
+    base = git(fixture_repo, "rev-parse", "HEAD")
+    assert (
+        cli.main(
+            [
+                "--repo",
+                str(fixture_repo),
+                "verify",
+                "docs/design.md",
+                "--claims",
+                str(fixture_repo / "claims.json"),
+            ]
+        )
+        == 0
+    )
+    write(fixture_repo, "src/config.py", "TIMEOUT = compute_timeout()\nRETRIES = 3\n")
+    commit_all(fixture_repo, "timeout becomes a non-literal")
+    res = revalidate(fixture_repo, since=base)
+    assert {cid for _, cid, _ in res.errored} == {"timeout"}
+    assert res.broken == []  # ERROR is never BROKEN
+    assert res.exit_code == cli.EXIT_ERRORED
diff --git a/tests/test_semantic_context.py b/tests/test_semantic_context.py
index d045988..80949ad 100644
--- a/tests/test_semantic_context.py
+++ b/tests/test_semantic_context.py
@@ -104,3 +104,21 @@ def test_code_bad_regex_is_error(tmp_path: Path) -> None:
 
 def test_code_path_escape_is_error(tmp_path: Path) -> None:
     assert _run(tmp_path, "code:../../etc/passwd::root").verdict is Verdict.ERROR
+
+
+def test_code_ignores_fstring_docstring_survival(tmp_path: Path) -> None:
+    """A fact surviving only in an f-string used as a (dead) doc statement must FAIL —
+    docstring detection must not be fooled by the f-string (ast.JoinedStr) form."""
+    _w(
+        tmp_path,
+        "m.py",
+        'def handler():\n    f"""serves /v1/login historically {1}."""\n    return 200\n',
+    )
+    assert _run(tmp_path, "code:m.py::/v1/login").verdict is Verdict.FAIL
+
+
+def test_code_keeps_real_string_co_located_on_docstring_line(tmp_path: Path) -> None:
+    """A genuine string literal sharing a physical line with a docstring must be KEPT —
+    docstring blanking is by AST node span, not by whole line."""
+    _w(tmp_path, "m.py", 'class C:\n    """DOC"""; ROUTE = "/v1/keepme"\n')
+    assert _run(tmp_path, "code:m.py::/v1/keepme").verdict is Verdict.PASS

From b7376e7762571e7c802c220aa50c241d2dae7e39 Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 18:37:16 +0530
Subject: [PATCH 08/13] release: bump to 1.0.0rc1 (V1 release candidate)

All V1 strengthening work packages (WP1-WP9) are implemented, tested, and documented;
the 5-lens adversarial review's BLOCK findings are all resolved with regression tests;
733 tests pass (incl. slow); lint clean. Bump the three version surfaces
(pyproject / __init__ / uv.lock) to the V1 release candidate. No tag, push, or publish.

rc1 (not final 1.0.0) is honest: the candidate invites real-repo benchmark validation and
the explicitly-deferred post-V1 items (declarative-structural checkers, route/SQL binding
indices, YAML config binding, audit-event atomicity) documented in docs/V1_SCOPE.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 pyproject.toml         | 2 +-
 src/dorian/__init__.py | 2 +-
 uv.lock                | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/pyproject.toml b/pyproject.toml
index 6806410..a762cc4 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "dorian-vwp"
-version = "0.11.0"
+version = "1.0.0rc1"
 description = "Hold AI agents to what they said they did: deterministic, token-free verification of claims about a change."
 readme = "README.md"
 requires-python = ">=3.11"
diff --git a/src/dorian/__init__.py b/src/dorian/__init__.py
index 4dcd06e..df260c7 100644
--- a/src/dorian/__init__.py
+++ b/src/dorian/__init__.py
@@ -3,4 +3,4 @@
 PyPI distribution: `dorian-vwp`; import package: `dorian`; CLI: `dorian`.
 """
 
-__version__ = "0.11.0"
+__version__ = "1.0.0rc1"
diff --git a/uv.lock b/uv.lock
index 17f1af9..5803a73 100644
--- a/uv.lock
+++ b/uv.lock
@@ -184,7 +184,7 @@ wheels = [
 
 [[package]]
 name = "dorian-vwp"
-version = "0.11.0"
+version = "1.0.0rc1"
 source = { editable = "." }
 
 [package.optional-dependencies]

From 47106042305a64458ec1cd3d02bddf43765fab29 Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 18:40:25 +0530
Subject: [PATCH 09/13] docs: re-stamp BENCHMARK_CURRENT at 1.0.0rc1 (numbers
 re-confirmed post-fix)

Re-ran large-mutation / binding-lifecycle / realworld at commit b7376e7 (1.0.0rc1),
after the adversarial-review fixes: figures identical (large-mutation P=R=0.93,
11.6x/10.4x; binding-lifecycle 808 pairs 0.54->1.00 selection, 1.00 alarm; realworld
2/1/2), confirming the fixes don't touch the benchmarked paths. Version/commit stamps
updated; the version-stamp evidence test now reads the live pyproject version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/BENCHMARK_CURRENT.md        | 9 +++++++--
 tests/test_benchmark_evidence.py | 5 ++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/docs/BENCHMARK_CURRENT.md b/docs/BENCHMARK_CURRENT.md
index daa5f5d..48cf898 100644
--- a/docs/BENCHMARK_CURRENT.md
+++ b/docs/BENCHMARK_CURRENT.md
@@ -10,12 +10,17 @@ and are kept as-is for provenance.
 
 | field | value |
 | --- | --- |
-| dorian version | `0.11.0` (V1 candidate) |
-| measured commit | `2a66a49eee7b8aa069d7fb9222572b272493856d` |
+| dorian version | `1.0.0rc1` (V1 release candidate) |
+| measured commit | `b7376e7762571e7c802c220aa50c241d2dae7e39` |
 | Python | 3.12.4 |
 | platform | darwin (CI matrix: 3.11 / 3.12 / 3.13) |
 | reproduce | `dorian bench large-mutation` · `dorian bench binding-lifecycle` · `dorian bench realworld-usecases` |
 
+These numbers were re-run at the `1.0.0rc1` commit *after* the adversarial-review fixes
+landed, confirming those fixes (py-const type check, `code:` docstring handling, config-key
+stopwords) did not move the benchmark figures — expected, since the suites exercise C1/C3
+(symbol/regex/string/path)/C5, not the new structural/config-binding paths.
+
 ## Results
 
 ### Large controlled-mutation (240 pairs, 6 synthetic domains)
diff --git a/tests/test_benchmark_evidence.py b/tests/test_benchmark_evidence.py
index af4eee6..640e955 100644
--- a/tests/test_benchmark_evidence.py
+++ b/tests/test_benchmark_evidence.py
@@ -34,8 +34,11 @@ def test_historical_benchmark_docs_are_labeled_historical() -> None:
 
 
 def test_current_benchmark_doc_is_version_and_commit_stamped() -> None:
+    import tomllib
+
     doc = _read("docs/BENCHMARK_CURRENT.md")
-    assert "0.11.0" in doc  # dorian version stamp
+    version = tomllib.loads(_read("pyproject.toml"))["project"]["version"]
+    assert version in doc, f"current benchmark doc must stamp the live version {version!r}"
     assert "measured commit" in doc.lower()
     assert "Python" in doc  # environment summary
     assert "reproduce" in doc.lower()

From 334910ceaa81f3779523902fa1d2f679c7e9cdd4 Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 18:50:21 +0530
Subject: [PATCH 10/13] docs: V1 alignment report + tracker (WP10 release
 decision)

Final evidence-backed report: version gate, per-WP status, commands+results, verification
evidence, trigger-vs-truth preservation, security posture, benchmark posture, remaining
risks/non-goals, and the release decision (1.0.0rc1 candidate; no tag/push/publish).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 V1_ALIGNMENT_REPORT.md       | 141 +++++++++++++++++++++++++++++++++++
 V1_IMPLEMENTATION_TRACKER.md |  10 ++-
 2 files changed, 149 insertions(+), 2 deletions(-)
 create mode 100644 V1_ALIGNMENT_REPORT.md

diff --git a/V1_ALIGNMENT_REPORT.md b/V1_ALIGNMENT_REPORT.md
new file mode 100644
index 0000000..fa4da7b
--- /dev/null
+++ b/V1_ALIGNMENT_REPORT.md
@@ -0,0 +1,141 @@
+# V1 alignment report
+
+Final report for the v0.11.0 → V1 strengthening program driven by
+`RESEARCH_REPORT_DORIAN_0_11_0.md`. Every completion claim below is backed by a file
+path and a command/test result. Behavior was verified against the current code; where
+the report and code disagreed, code won (recorded in `V1_IMPLEMENTATION_TRACKER.md`).
+
+## 1. Version gate result
+
+| surface | start | final |
+|---|---|---|
+| `pyproject.toml` `[project].version` | `0.11.0` | `1.0.0rc1` |
+| `src/dorian/__init__.py` `__version__` | `0.11.0` | `1.0.0rc1` |
+| `dorian --version` | `dorian 0.11.0` | `dorian 1.0.0rc1` |
+| branch | `main` @ `78dcd1a` | `dorian-v1-strengthening` @ `4710604` |
+
+Version gate **PASSED** at start (both surfaces `0.11.0`). No tag, push, publish, or
+remote change performed.
+
+## 2. Executive result
+
+**V1 release candidate ready** (`1.0.0rc1`). All ten work packages are implemented (or
+explicitly deferred with reasons), tested, and documented; a 5-lens adversarial review
+returned BLOCK and all six must-fix findings are resolved with regression tests; the full
+733-test suite and lint pass at the release commit.
+
+## 3. Work completed
+
+| WP | Status | Files | Tests | Caveat |
+|---|---|---|---|---|
+| WP1 docs/evidence hygiene | complete | README, docs/V1_SCOPE.md, BENCHMARK_v0.7.0/BINDING_LIFECYCLE banners, BENCHMARK_CURRENT.md | test_benchmark_evidence (5) | trust-state legend + historical labels |
+| WP2 checker-strength / claim-risk | complete | src/dorian/strength.py, commands.py (bindings + binding-gate) | test_strength (20) | advisory only; never changes verdict/exit |
+| WP3 Python structural checkers | complete | src/dorian/pyast.py, checkers/c3_ref.py, seal.py, spec/checkers.md | test_pystructural (29) | gutted-body is the documented ceiling |
+| WP4 semantic-context `code:` | complete | pyast.code_only_python, c3_ref.py | test_semantic_context (14) | Python-only (documented) |
+| WP5 multi-index binding (config-key) | complete | symbol_index.py (config_key_index, claim_watch_paths), commands.py | test_config_binding (12) | TOML/JSON only; YAML excluded (zero-dep) |
+| WP6 C4 test-adequacy lint | complete | strength.c4_adequacy | (in test_strength) | advisory; conservative on helpers |
+| WP7 trusted-base checker-source | complete | revalidate.py, cli.py, commands.py, action/action.yml | test_trusted_base (10) | trust root, NOT a sandbox |
+| WP8 warrant-quality harness | complete | bench/warrant_quality.py, commands.py | test_warrant_quality (7) | structural/existence forms scored; others strength-only |
+| WP9 current-version benchmarks | complete | docs/BENCHMARK_CURRENT.md | docs wording tests | synthetic-suite reproducibility only |
+| WP10 release prep | complete | pyproject/__init__/uv.lock → 1.0.0rc1 | test_version_sync (3) | rc, not final 1.0.0 |
+
+**Deferred (classified in `docs/V1_SCOPE.md`, not V1 blockers):** declarative-structural
+checkers (config/OpenAPI/SQL value/type — the report's C7-style family), route/SQL binding
+indices, YAML config binding (needs a runtime dep), the real-repo public micro-benchmark
+(protocol exists; results post-V1), and audit-event/state single-transaction atomicity
+(pre-existing, documented in `fold.py`).
+
+## 4. Commands run (final state, commit `4710604`)
+
+| command | result |
+|---|---|
+| `uv run dorian --version` | `dorian 1.0.0rc1` |
+| `uv run ruff check src tests bench` | `All checks passed!` |
+| `uv run ruff format --check src tests bench` | `108 files already formatted` |
+| `uv run pytest -m "not slow"` | exit 0 — **658 passed** |
+| `uv run pytest -m slow` | exit 0 — slow suite passed (wheel build, real pytest subprocess, regex-timeout) |
+| `uv run pytest` (full, incl slow) | exit 0 — **733 collected** (baseline 636 → +97) |
+| `dorian bench large-mutation` | 240 pairs, P=R=0.93, 11.6×/10.4× FP reduction |
+| `dorian bench binding-lifecycle` | 808 pairs, selection recall 0.54→1.00, alarm precision/recall 1.00, 0 errored |
+| `dorian bench realworld-usecases` | 5 cases: 2 solved / 1 partial / 2 not_solved |
+| `mcp gitnexus detect_changes` (pre-commit) | changed symbols == intended; no surprise blast radius |
+
+## 5. Verification evidence
+
+- **Test suite:** 733 tests pass at `4710604` (lint + non-slow + slow all exit 0). +97 over
+  the 636-test `78dcd1a` baseline, across 6 new test files
+  (test_pystructural, test_semantic_context, test_strength, test_trusted_base,
+  test_config_binding, test_warrant_quality, test_benchmark_evidence).
+- **CLI smoke:** `dorian bindings <artifact>` shows strength/risk (JSON + human golden tests);
+  `dorian bench warrant-quality --json` emits `dorian-warrant-quality-v1`;
+  `dorian revalidate --checker-source base` and env `DORIAN_CHECKER_SOURCE` both exercised.
+- **Security fixtures:** `tests/test_trusted_base.py` (10) proves each "executed?" case with a
+  sentinel `touch` that must NOT appear under base mode — PR-added and PR-modified executable
+  checkers never run; missing/tampered base sidecar fails closed (ERRORED); deny-exec composes.
+- **Benchmarks:** re-run at `1.0.0rc1`; figures identical to the historical runs (large-mutation
+  vs v0.7.0; binding-lifecycle same content-derived run_id as 0.9.0) — additive, no regression.
+- **Docs wording:** historical docs carry version stamps/banners; `BENCHMARK_CURRENT.md` is
+  version+commit stamped with a what-it-does-NOT-prove block; guard tests pin all of it.
+
+## 6. Trigger-vs-truth preservation
+
+The distinction is preserved and made **more visible**, never blurred:
+
+- **Binding (trigger) stays trigger-only.** Config-key binding (WP5) and symbol binding only
+  widen the re-check set; `docs/VALIDATION_HONESTY.md`, `docs/V1_SCOPE.md`, and the
+  binding-lifecycle benchmark all state a watched-file change never makes a claim BROKEN by itself.
+- **New truth-axis surfacing.** WP2 checker-strength classifies each checker's falsifying power
+  and flags kind-vs-strength **adequacy mismatches** (a `behavior` claim backed only by an
+  existence/text checker; a vacuous pytest node). WP8 warrant-quality scores per-claim
+  caught/missed/brittle/**ceiling** offline.
+- **The ceiling is pinned, not hidden.** `py-signature:`/`symbol:` on a gutted-body change PASS
+  (a `test_..._gutted_body_still_passes_documented_ceiling` test asserts it); only a C4 test
+  catches a body change. ERROR is never BROKEN — a new end-to-end test drives a new-form ERROR
+  and asserts it lands in `errored` (exit 5), never `broken`.
+
+## 7. Security posture
+
+- **Trusted/internal (`head`, default):** unchanged from v0.11.0 — executes the checked-out
+  checker specs. Correct where everyone who can open a PR is trusted to run code in CI.
+- **Public/fork (`checker_trust: base`):** **implemented and tested** (WP7). Resolves each
+  claim's checker spec from the base ref, so PR-added/modified executable checkers never run and
+  a rewritten checker cannot self-attest a verdict; fails closed on a missing/tampered base
+  sidecar. The `SECURITY.md` / `action/README.md` contradictions (which still said it was
+  unimplemented) were fixed and a guard test prevents recurrence.
+- **Remaining non-sandbox caveat (stated everywhere):** a base-approved `pytest:` checker can
+  still execute PR-head code. base mode is a **checker-source trust root, not a sandbox** — for
+  fully untrusted forks combine `checker_trust: base` with `deny_exec: true` (or external
+  isolation). `--deny-exec`/`--deny-shell` remain fail-closed, not sandboxes.
+
+## 8. Benchmark / evidence posture
+
+- **Current results:** `docs/BENCHMARK_CURRENT.md` — version+commit stamped (1.0.0rc1 / `b7376e7`),
+  reproduction commands, environment, and an explicit non-overclaim block.
+- **Historical docs labeled:** `BENCHMARK_v0.7.0.md` (version-stamped title; it is byte-matched to
+  its generator so it cannot carry a hand banner) and `BENCHMARK_BINDING_LIFECYCLE.md` (0.9.0,
+  HISTORICAL banner). Both preserved verbatim and cross-referenced from the current doc.
+- **What the benchmarks support:** reproducibility on the named synthetic suites at the stamped
+  version, fewer false re-checks than file watchers, near-complete binding trigger recall with
+  zero false BROKEN — and that V1's additions did not regress any of it. **Not supported:**
+  "works on real repos", "validated", or that binding proves behavior (the gutted-body ceiling).
+
+## 9. Remaining risks and non-goals (after implementation)
+
+- **No real-repo validation yet** — evidence is synthetic-suite reproducibility plus offline
+  public-case reproductions; the public frozen-SHA micro-benchmark is protocol-only (post-V1).
+- **`code:`/structural forms are Python-only**; other languages keep the raw-text survival class.
+- **Config binding is TOML/JSON only** (YAML needs a runtime dep); unparseable supported config
+  files are surfaced, not silently skipped.
+- **Audit-event/state atomicity** — change + event still commit separately (`fold.py`); a crash
+  between them can drop the event. Pre-existing, documented.
+- **`--extract` stays draft/experimental** — not promoted in V1.
+
+## 10. Release decision
+
+**V1 release candidate prepared.** All quality gates passed, so version surfaces were synced to
+`1.0.0rc1` (pyproject / `__init__` / uv.lock; `dorian --version` agrees; `test_version_sync`
+green). It is a **release candidate, not final 1.0.0** — honest given the deferred post-V1 items
+above and the absence of real-repo validation. **No tag, push, publish, or remote/secret change
+was performed**, per the operating rules; the work lives on branch `dorian-v1-strengthening`
+(9 commits off `main`). Suggested next steps (owner's call): open a PR to `main`, then run the
+real-repo public micro-benchmark before promoting `1.0.0rc1 → 1.0.0`.
diff --git a/V1_IMPLEMENTATION_TRACKER.md b/V1_IMPLEMENTATION_TRACKER.md
index 24bcfb7..71dff13 100644
--- a/V1_IMPLEMENTATION_TRACKER.md
+++ b/V1_IMPLEMENTATION_TRACKER.md
@@ -108,6 +108,12 @@ Categories: IMPL=must-implement · TEST=must-test regression · DOC=must-documen
 | WP7 | trusted-base checker-source mode | DONE (revalidate --checker-source base + Action checker_trust; 10-case exploit matrix) |
 | WP8 | warrant-quality mutation harness | DONE (bench/warrant_quality.py; `dorian bench warrant-quality`; deterministic, offline, never mutates real repo; trigger vs verdict; ERROR bucket distinct; honest scope = structural/existence forms scored, others reported strength-only; 7 tests) |
 | WP9 | current-version benchmark results | TODO |
-| WP10 | V1 release prep / decision | TODO |
+| WP10 | V1 release prep / decision | DONE — version surfaces synced to `1.0.0rc1` (pyproject/__init__/uv.lock); no tag/push/publish. All gates pass; adversarial-review BLOCK resolved. |
 
-Commits so far: `58b39e2` (WP3/4/2/6), trusted-base (WP7) next.
+Branch `dorian-v1-strengthening`, 9 commits off `main`:
+`58b39e2` WP3/4/2/6 · `6a8298c` WP7 · `04ab60b` WP5 · `2a66a49` WP8 · `4e586a7` WP9/WP1 ·
+`2a4befa` byte-match fix · `a6595ba` adversarial-review BLOCK fixes · `b7376e7` bump 1.0.0rc1 ·
+`4710604` benchmark re-stamp.
+
+Adversarial review (5 lenses, BLOCK): 6 must-fixes + 2 hygiene items all resolved with
+regression tests. Final gate: ruff clean, 658 non-slow pass, 733 total (incl slow) green.

From 33e9eaf4ae929d2736f14c682e0a55cb04c1a37d Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 20:10:24 +0530
Subject: [PATCH 11/13] audit: reconcile V1 release evidence + fix
 release-blocking doc drift
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Independent release audit (FIXED_NEEDED) findings, all repaired:

Blockers:
- docs/START_HERE.md still called trusted-base "(not yet implemented)" — a user-facing
  CI entry-point doc the prior fix missed; now describes it as implemented (V1).
- docs/BENCHMARK_BINDING_LIFECYCLE.md banner said the current rerun was "0.11.0" while the
  branch is 1.0.0rc1 (and BENCHMARK_CURRENT says 1.0.0rc1) — corrected the version.
- internal program docs (V1_IMPLEMENTATION_TRACKER.md, V1_ALIGNMENT_REPORT.md) were tracked;
  gitignored + git rm --cached (kept on disk as provenance). Also gitignore the research
  report, audit gate, release notes, and tool dirs (.claude/, .gitnexus/). docs/V1_SCOPE.md
  stays tracked (it is a public doc).

Should-fixes:
- docs/ROADMAP_BACKLOG.md trusted-base item flipped DEFER/HUMAN-REVIEW -> SHIPPED (V1).
- c3_ref.py module docstring now documents the code: form (was omitted) and the py-const
  value-AND-type rule.
- action.yml / action/README.md drop the stale 'dorian-vwp==0.6.*' pin example (no PyPI
  release yet) for the git source spec.
- docs/BENCHMARK_CURRENT.md labels the metric commit vs the (docs-only) release commit.

Hardened tests: test_no_live_doc_calls_trusted_base_unimplemented scans ALL live docs (not
just SECURITY.md/action README); warrant-quality path-escape test pins the containment guard;
benchmark-evidence commit-stamp check is version-agnostic. 660 non-slow tests pass; lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .gitignore                             |  11 ++
 V1_ALIGNMENT_REPORT.md                 | 141 -------------------------
 V1_IMPLEMENTATION_TRACKER.md           | 119 ---------------------
 action/README.md                       |   2 +-
 action/action.yml                      |   5 +-
 docs/BENCHMARK_BINDING_LIFECYCLE.md    |   2 +-
 docs/BENCHMARK_CURRENT.md              |  11 +-
 docs/ROADMAP_BACKLOG.md                |  10 +-
 docs/START_HERE.md                     |   5 +-
 src/dorian/checkers/c3_ref.py          |  18 ++--
 tests/test_action_security_defaults.py |  27 +++++
 tests/test_benchmark_evidence.py       |   4 +-
 tests/test_warrant_quality.py          |  25 +++++
 13 files changed, 97 insertions(+), 283 deletions(-)
 delete mode 100644 V1_ALIGNMENT_REPORT.md
 delete mode 100644 V1_IMPLEMENTATION_TRACKER.md

diff --git a/.gitignore b/.gitignore
index eec3828..02c1d4f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -15,3 +15,14 @@ bench/real/
 .DS_Store
 /assets/
 .env
+
+# tool working dirs (not release content)
+.claude/
+.gitnexus/
+
+# internal program/audit working docs — provenance only, never shipped in the release
+/RESEARCH_REPORT_DORIAN_0_11_0.md
+/V1_IMPLEMENTATION_TRACKER.md
+/V1_ALIGNMENT_REPORT.md
+/AUDIT_RELEASE_GATE.md
+/GITHUB_RELEASE_NOTES.md
diff --git a/V1_ALIGNMENT_REPORT.md b/V1_ALIGNMENT_REPORT.md
deleted file mode 100644
index fa4da7b..0000000
--- a/V1_ALIGNMENT_REPORT.md
+++ /dev/null
@@ -1,141 +0,0 @@
-# V1 alignment report
-
-Final report for the v0.11.0 → V1 strengthening program driven by
-`RESEARCH_REPORT_DORIAN_0_11_0.md`. Every completion claim below is backed by a file
-path and a command/test result. Behavior was verified against the current code; where
-the report and code disagreed, code won (recorded in `V1_IMPLEMENTATION_TRACKER.md`).
-
-## 1. Version gate result
-
-| surface | start | final |
-|---|---|---|
-| `pyproject.toml` `[project].version` | `0.11.0` | `1.0.0rc1` |
-| `src/dorian/__init__.py` `__version__` | `0.11.0` | `1.0.0rc1` |
-| `dorian --version` | `dorian 0.11.0` | `dorian 1.0.0rc1` |
-| branch | `main` @ `78dcd1a` | `dorian-v1-strengthening` @ `4710604` |
-
-Version gate **PASSED** at start (both surfaces `0.11.0`). No tag, push, publish, or
-remote change performed.
-
-## 2. Executive result
-
-**V1 release candidate ready** (`1.0.0rc1`). All ten work packages are implemented (or
-explicitly deferred with reasons), tested, and documented; a 5-lens adversarial review
-returned BLOCK and all six must-fix findings are resolved with regression tests; the full
-733-test suite and lint pass at the release commit.
-
-## 3. Work completed
-
-| WP | Status | Files | Tests | Caveat |
-|---|---|---|---|---|
-| WP1 docs/evidence hygiene | complete | README, docs/V1_SCOPE.md, BENCHMARK_v0.7.0/BINDING_LIFECYCLE banners, BENCHMARK_CURRENT.md | test_benchmark_evidence (5) | trust-state legend + historical labels |
-| WP2 checker-strength / claim-risk | complete | src/dorian/strength.py, commands.py (bindings + binding-gate) | test_strength (20) | advisory only; never changes verdict/exit |
-| WP3 Python structural checkers | complete | src/dorian/pyast.py, checkers/c3_ref.py, seal.py, spec/checkers.md | test_pystructural (29) | gutted-body is the documented ceiling |
-| WP4 semantic-context `code:` | complete | pyast.code_only_python, c3_ref.py | test_semantic_context (14) | Python-only (documented) |
-| WP5 multi-index binding (config-key) | complete | symbol_index.py (config_key_index, claim_watch_paths), commands.py | test_config_binding (12) | TOML/JSON only; YAML excluded (zero-dep) |
-| WP6 C4 test-adequacy lint | complete | strength.c4_adequacy | (in test_strength) | advisory; conservative on helpers |
-| WP7 trusted-base checker-source | complete | revalidate.py, cli.py, commands.py, action/action.yml | test_trusted_base (10) | trust root, NOT a sandbox |
-| WP8 warrant-quality harness | complete | bench/warrant_quality.py, commands.py | test_warrant_quality (7) | structural/existence forms scored; others strength-only |
-| WP9 current-version benchmarks | complete | docs/BENCHMARK_CURRENT.md | docs wording tests | synthetic-suite reproducibility only |
-| WP10 release prep | complete | pyproject/__init__/uv.lock → 1.0.0rc1 | test_version_sync (3) | rc, not final 1.0.0 |
-
-**Deferred (classified in `docs/V1_SCOPE.md`, not V1 blockers):** declarative-structural
-checkers (config/OpenAPI/SQL value/type — the report's C7-style family), route/SQL binding
-indices, YAML config binding (needs a runtime dep), the real-repo public micro-benchmark
-(protocol exists; results post-V1), and audit-event/state single-transaction atomicity
-(pre-existing, documented in `fold.py`).
-
-## 4. Commands run (final state, commit `4710604`)
-
-| command | result |
-|---|---|
-| `uv run dorian --version` | `dorian 1.0.0rc1` |
-| `uv run ruff check src tests bench` | `All checks passed!` |
-| `uv run ruff format --check src tests bench` | `108 files already formatted` |
-| `uv run pytest -m "not slow"` | exit 0 — **658 passed** |
-| `uv run pytest -m slow` | exit 0 — slow suite passed (wheel build, real pytest subprocess, regex-timeout) |
-| `uv run pytest` (full, incl slow) | exit 0 — **733 collected** (baseline 636 → +97) |
-| `dorian bench large-mutation` | 240 pairs, P=R=0.93, 11.6×/10.4× FP reduction |
-| `dorian bench binding-lifecycle` | 808 pairs, selection recall 0.54→1.00, alarm precision/recall 1.00, 0 errored |
-| `dorian bench realworld-usecases` | 5 cases: 2 solved / 1 partial / 2 not_solved |
-| `mcp gitnexus detect_changes` (pre-commit) | changed symbols == intended; no surprise blast radius |
-
-## 5. Verification evidence
-
-- **Test suite:** 733 tests pass at `4710604` (lint + non-slow + slow all exit 0). +97 over
-  the 636-test `78dcd1a` baseline, across 6 new test files
-  (test_pystructural, test_semantic_context, test_strength, test_trusted_base,
-  test_config_binding, test_warrant_quality, test_benchmark_evidence).
-- **CLI smoke:** `dorian bindings <artifact>` shows strength/risk (JSON + human golden tests);
-  `dorian bench warrant-quality --json` emits `dorian-warrant-quality-v1`;
-  `dorian revalidate --checker-source base` and env `DORIAN_CHECKER_SOURCE` both exercised.
-- **Security fixtures:** `tests/test_trusted_base.py` (10) proves each "executed?" case with a
-  sentinel `touch` that must NOT appear under base mode — PR-added and PR-modified executable
-  checkers never run; missing/tampered base sidecar fails closed (ERRORED); deny-exec composes.
-- **Benchmarks:** re-run at `1.0.0rc1`; figures identical to the historical runs (large-mutation
-  vs v0.7.0; binding-lifecycle same content-derived run_id as 0.9.0) — additive, no regression.
-- **Docs wording:** historical docs carry version stamps/banners; `BENCHMARK_CURRENT.md` is
-  version+commit stamped with a what-it-does-NOT-prove block; guard tests pin all of it.
-
-## 6. Trigger-vs-truth preservation
-
-The distinction is preserved and made **more visible**, never blurred:
-
-- **Binding (trigger) stays trigger-only.** Config-key binding (WP5) and symbol binding only
-  widen the re-check set; `docs/VALIDATION_HONESTY.md`, `docs/V1_SCOPE.md`, and the
-  binding-lifecycle benchmark all state a watched-file change never makes a claim BROKEN by itself.
-- **New truth-axis surfacing.** WP2 checker-strength classifies each checker's falsifying power
-  and flags kind-vs-strength **adequacy mismatches** (a `behavior` claim backed only by an
-  existence/text checker; a vacuous pytest node). WP8 warrant-quality scores per-claim
-  caught/missed/brittle/**ceiling** offline.
-- **The ceiling is pinned, not hidden.** `py-signature:`/`symbol:` on a gutted-body change PASS
-  (a `test_..._gutted_body_still_passes_documented_ceiling` test asserts it); only a C4 test
-  catches a body change. ERROR is never BROKEN — a new end-to-end test drives a new-form ERROR
-  and asserts it lands in `errored` (exit 5), never `broken`.
-
-## 7. Security posture
-
-- **Trusted/internal (`head`, default):** unchanged from v0.11.0 — executes the checked-out
-  checker specs. Correct where everyone who can open a PR is trusted to run code in CI.
-- **Public/fork (`checker_trust: base`):** **implemented and tested** (WP7). Resolves each
-  claim's checker spec from the base ref, so PR-added/modified executable checkers never run and
-  a rewritten checker cannot self-attest a verdict; fails closed on a missing/tampered base
-  sidecar. The `SECURITY.md` / `action/README.md` contradictions (which still said it was
-  unimplemented) were fixed and a guard test prevents recurrence.
-- **Remaining non-sandbox caveat (stated everywhere):** a base-approved `pytest:` checker can
-  still execute PR-head code. base mode is a **checker-source trust root, not a sandbox** — for
-  fully untrusted forks combine `checker_trust: base` with `deny_exec: true` (or external
-  isolation). `--deny-exec`/`--deny-shell` remain fail-closed, not sandboxes.
-
-## 8. Benchmark / evidence posture
-
-- **Current results:** `docs/BENCHMARK_CURRENT.md` — version+commit stamped (1.0.0rc1 / `b7376e7`),
-  reproduction commands, environment, and an explicit non-overclaim block.
-- **Historical docs labeled:** `BENCHMARK_v0.7.0.md` (version-stamped title; it is byte-matched to
-  its generator so it cannot carry a hand banner) and `BENCHMARK_BINDING_LIFECYCLE.md` (0.9.0,
-  HISTORICAL banner). Both preserved verbatim and cross-referenced from the current doc.
-- **What the benchmarks support:** reproducibility on the named synthetic suites at the stamped
-  version, fewer false re-checks than file watchers, near-complete binding trigger recall with
-  zero false BROKEN — and that V1's additions did not regress any of it. **Not supported:**
-  "works on real repos", "validated", or that binding proves behavior (the gutted-body ceiling).
-
-## 9. Remaining risks and non-goals (after implementation)
-
-- **No real-repo validation yet** — evidence is synthetic-suite reproducibility plus offline
-  public-case reproductions; the public frozen-SHA micro-benchmark is protocol-only (post-V1).
-- **`code:`/structural forms are Python-only**; other languages keep the raw-text survival class.
-- **Config binding is TOML/JSON only** (YAML needs a runtime dep); unparseable supported config
-  files are surfaced, not silently skipped.
-- **Audit-event/state atomicity** — change + event still commit separately (`fold.py`); a crash
-  between them can drop the event. Pre-existing, documented.
-- **`--extract` stays draft/experimental** — not promoted in V1.
-
-## 10. Release decision
-
-**V1 release candidate prepared.** All quality gates passed, so version surfaces were synced to
-`1.0.0rc1` (pyproject / `__init__` / uv.lock; `dorian --version` agrees; `test_version_sync`
-green). It is a **release candidate, not final 1.0.0** — honest given the deferred post-V1 items
-above and the absence of real-repo validation. **No tag, push, publish, or remote/secret change
-was performed**, per the operating rules; the work lives on branch `dorian-v1-strengthening`
-(9 commits off `main`). Suggested next steps (owner's call): open a PR to `main`, then run the
-real-repo public micro-benchmark before promoting `1.0.0rc1 → 1.0.0`.
diff --git a/V1_IMPLEMENTATION_TRACKER.md b/V1_IMPLEMENTATION_TRACKER.md
deleted file mode 100644
index 71dff13..0000000
--- a/V1_IMPLEMENTATION_TRACKER.md
+++ /dev/null
@@ -1,119 +0,0 @@
-# V1 implementation tracker
-
-Working tracker for the v0.11.0 → V1 strengthening program driven by
-`RESEARCH_REPORT_DORIAN_0_11_0.md`. Behavior is verified against the **current
-code**, not the report; where they disagree, code wins and the disagreement is
-recorded here.
-
-## Phase 0 — version gate + scope evidence
-
-**Version gate: PASSED.**
-
-| Surface | Observed |
-|---|---|
-| `pyproject.toml` `[project].version` | `0.11.0` |
-| `src/dorian/__init__.py` `__version__` | `0.11.0` |
-| branch | `main` |
-| commit SHA (start) | `78dcd1a6a242110e55dc31fd1db2e811de3e3898` |
-| working tree | clean except untracked `.claude/`, `AGENTS.md`, `CLAUDE.md`, `RESEARCH_REPORT_DORIAN_0_11_0.md` |
-| Python | 3.12.4 |
-| toolchain | `uv` 0.5.9; `uv run pytest`; ruff for lint/format |
-| baseline tests | `uv run pytest -m "not slow"` → **561 passed, exit 0**; 636 total incl. slow |
-
-## Phase 1 — baseline reconstruction (from current code)
-
-### Module map
-- `model.py` — `Warrant`/`Claim`/`CheckerSpec`/`ReadSetEntry`, content-addressed id, canonical JSON. `CheckerType = C1|C3|C4|C5` (a *Literal* hint; registry dispatch is on the string `type`).
-- `checkers/base.py` — `run_checker` is the single dispatch + the single execution-policy gate (blocked → `Verdict.ERROR`).
-- `checkers/c1_span.py` — span anchor, relocation-tolerant, optional c2lite.
-- `checkers/c3_ref.py` — `path:` / `symbol:` / `string:` / `regex:`; regex match in a spawn-killed worker (ReDoS backstop).
-- `checkers/c4_test.py` — `pytest:<nodeid>`, careful exit-code mapping; ERROR≠FAIL.
-- `checkers/c5_data.py` — typed data forms + opaque `shell:`.
-- `policy.py` — `ExecutionPolicy`, `executable_kind` (single source of "what executes": C4=pytest, C5 shell=shell).
-- `seal.py` — born-verifiable seal; scope lint; watch derivation; additive symbol-definer widening; duplicate-id reject; atomic write; idempotent re-seal.
-- `revalidate.py` — changed-path discovery, rename persistence, cheapest-first checks (C1<C3<C5<C4), fold, recall fanout; ERROR→ERRORED.
-- `fold.py` — `fold()` pure fn → TRUSTED/DEGRADED/REVOKED/UNKNOWN. (Born state is `WARRANTED`, set at seal.)
-- `bindings.py` — binding diagnostics + opt-in `--binding-gate` (off/warn/fail). Flags: unbacked, single-file, short-literal, ambiguous-mention, trigger-only-symbol, unwatched-mention.
-- `symbol_index.py` — Python symbol→definer index + pyproject console-script index; ambiguity skipped.
-- `gitio.py` — git plumbing incl. `file_at_ref` (needed for trusted-base).
-- `commands.py` / `cli.py` — command surface; exit codes 0/2/3/4/5/6.
-- `store.py` / `blast.py` / `report.py` — derived SQLite, lineage, audit JSONL.
-
-### Trust-boundary map
-- Non-executable: C1, C3, typed C5. Executable: C4 `pytest:`, C5 `shell:`.
-- `--deny-exec`/`--deny-shell` (+ env) are fail-closed, NOT a sandbox. Blocked → ERROR.
-- Sidecars are source of truth; SQLite derived (`sync` rebuilds).
-- Action runs checkers from the **checked-out (head)** sidecars → trusted/internal only today; trusted-base is design-only (`docs/TRUSTED_BASE_ACTION_DESIGN.md`).
-
-### Benchmark/docs freshness map
-- `docs/BENCHMARK_v0.7.0.md` — title-stamped **v0.7.0**, synthetic. HISTORICAL.
-- `docs/BENCHMARK_BINDING_LIFECYCLE.md` — header `dorian 0.9.0`, run_id, 808 pairs. HISTORICAL.
-- `docs/PUBLIC_BENCHMARK_PROTOCOL.md` — protocol only, no results.
-- No current-version (0.11+) result doc exists.
-
-### Report findings verified against code (code wins)
-- **README `WARRANTED -> REVOKED` is NOT drift.** Report (medium-confidence) called it stale. Verified: `fold.fold()` only emits TRUSTED/DEGRADED/REVOKED/UNKNOWN; the *born* trust state is `WARRANTED` (set at seal); the first fold therefore renders `WARRANTED -> <new>`. `tests/test_render_md.py:168-169` pins `WARRANTED -> REVOKED` and `WARRANTED -> UNKNOWN` as correct md output. Action: **do not "fix"; add a short trust-state vocabulary note to remove reader confusion.**
-- **C4 adequacy blind spot** — report marks INFERENCE; confirmed: `c4_test.py` maps pytest exit codes only, no assertion/relevance inspection. Valid advisory target (WP6).
-- **PyPI install wording** — report marks UNVERIFIED. Per project state, dorian is NOT on PyPI; README "until the first PyPI release … install from source" is accurate. Keep.
-
-## Report coverage matrix (every material finding classified)
-
-Categories: IMPL=must-implement · TEST=must-test regression · DOC=must-document · BENCH=must-benchmark · BOUNDARY=honest non-goal · DONE=already in v0.11.0 · DEFER=post-V1/blocked.
-
-| # | Report finding / recommendation | Category | Current evidence | Planned action | Acceptance/verification | Status |
-|---|---|---|---|---|---|---|
-| 1 | README trust-state vocab (WARRANTED vs TRUSTED/…) | DOC | code correct; README lacks a glossary | add trust-state legend; keep examples | docs test + render_md tests stay green | TODO |
-| 2 | ERROR must never collapse into BROKEN | DONE+TEST | base/fold/revalidate all enforce | keep; add a guard test if any new path | existing + new ERROR≠BROKEN tests | TODO |
-| 3 | C1 span + c2lite regression | DONE | test_c1.py | none (keep green) | test_c1 passes | DONE |
-| 4 | C3 regex ReDoS timeout regression | DONE | test_c3_regex_timeout.py (slow) | none | passes | DONE |
-| 5 | C3 symbol existence ceiling / gutted-body | IMPL+DOC | symbol: existence-only | add `py-signature:` structural checker (WP3) | gutted-body PASS under symbol, FAIL under signature when sig changes; body-only stays PASS (documented ceiling) | TODO |
-| 6 | C3 string/regex comment/docstring survival | IMPL+DOC | raw text search | add semantic code-context search mode (WP4) | literal only in comment/docstring → FAIL in code mode | TODO |
-| 7 | C4 pytest vacuous/zero-assertion adequacy | IMPL | none | advisory adequacy lint (WP6) | zero-assertion / assert-True node warns; normal test does not | TODO |
-| 8 | C5 typed grammar limits / snapshot brittleness | BOUNDARY+DOC | documented | document in V1-meaning; optional structural data checker DEFER | doc states grammar bounds | TODO |
-| 9 | duplicate claim-id rejection | DONE | seal.py step 0 | keep | test_seal covers | DONE |
-| 10 | scope-lint named-read-set-only limitation | DONE+DOC | SECURITY_BOUNDARY | keep wording | docs test | DONE |
-| 11 | deny-exec/deny-shell fail-closed, not sandbox | DONE | policy.py, docs | keep | test_deny_exec_policy | DONE |
-| 12 | sidecar source-of-truth vs SQLite derived | DONE | seal/revalidate/sync | keep | test_store/sync | DONE |
-| 13 | canonical JSON / content-addressed identity | DONE | model.compute_id + Warrant.load integrity | keep | test_model/determinism | DONE |
-| 14 | atomic no-write on failed seal | DONE | seal os.replace + refusal order | keep | test_seal/deny_exec | DONE |
-| 15 | changed-path discovery + persisted rename | DONE | revalidate + store rename_log | keep | test_revalidate | DONE |
-| 16 | checker ordering + FAIL vs ERROR discipline | DONE | revalidate _check_claim | keep | existing | DONE |
-| 17 | fold + blast/recall lineage | DONE | fold.py, blast.py | keep | test_fold/test_blast | DONE |
-| 18 | audit/state separate-transaction limitation | BOUNDARY | fold.py docstring documents it | document in V1-meaning as known limitation | doc names it | TODO |
-| 19 | binding ambiguity handling | DONE | symbol_index ambiguous_symbol_mentions + flag | keep; extend provenance (WP5) | test_symbol_index | DONE |
-| 20 | oversized/unparseable file diagnostics | IMPL | silently skipped today | surface multi-index unparse diagnostics (WP5) loudly | giant/unparseable supported file → diagnostic not silent | TODO |
-| 21 | pyproject script binding | DONE | pyproject_script_definers | keep | test_symbol_index | DONE |
-| 22 | watch glob over/under-match risk | TEST | _covered glob logic | add a glob over/under test if WP5 touches it | test | TODO |
-| 23 | public/fork self-attested verdict risk | IMPL+DOC | head-mode only | trusted-base checker-source (WP7) | exploit fixtures: PR-added/modified exec checker not run; non-exec rewrite surfaced | TODO |
-| 24 | trusted-base design + non-sandbox caveat | IMPL+DOC | design-only | implement `--checker-source base` + Action input; keep non-sandbox caveat | WP7 test matrix | TODO |
-| 25 | historical benchmark docs (v0.7.0, v0.9.0) | DOC | unlabeled as historical in body | add HISTORICAL banner; README cross-link labels | docs wording test | TODO |
-| 26 | public benchmark protocol w/o results | DOC | protocol only | keep; note in current-results doc | unchanged | TODO |
-| 27 | current-version benchmark rerun | BENCH | none | rerun + version-stamped `BENCHMARK_CURRENT.md` | bench smoke + stamp present | TODO |
-| 28 | extractor remains draft/experimental | DONE | README + AGENT_CLAIMS | keep; do not promote | docs test | DONE |
-| 29 | release/install-status uncertainty | DOC | README source-install accurate | keep; V1 release report states status | report | TODO |
-| 30 | checker-strength / claim-risk visibility | IMPL | bindings flags exist but no strength score | strength + claim-risk diagnostics (WP2) | behavior+symbol → adequacy-mismatch; unbacked load-bearing → high risk | TODO |
-| 31 | multi-index binding (routes/config/etc.) | IMPL | python+script only | config-key index (WP5), provenance-tagged | config-key change selects claim; ambiguous skipped+warned | TODO |
-| 32 | warrant-quality mutation harness | BENCH | repo-level bench only | `dorian bench warrant-quality` (WP8) | deterministic per-claim trigger/truth score on fixture | TODO |
-
-## Work-package status (live)
-
-| WP | Title | Status |
-|---|---|---|
-| WP1 | docs/evidence hygiene | DONE (trust-state legend; historical banners on v0.7.0/0.9.0 benchmark docs; docs/V1_SCOPE.md; README command-surface + new-forms + historical labels; benchmark-evidence wording tests) |
-| WP2 | checker-strength / claim-risk linter | DONE (strength.py; surfaced in `bindings` + binding-gate warn; 19 tests) |
-| WP3 | Python structural checkers (py-signature, py-const) | DONE (pyast.py + C3 subgrammars; 27 tests incl. e2e) |
-| WP4 | semantic-context source search (`code:`) | DONE (pyast.code_only_python + C3 `code:`; 12 tests) |
-| WP5 | multi-index binding (config-key) | DONE (symbol_index.config_key_index + claim_watch_paths; TOML/JSON only, YAML excluded = zero-dep; provenance in bind-suggest; ambiguity + unparseable surfaced; 9 tests) |
-| WP6 | C4 test-adequacy lint | DONE (strength.c4_adequacy; folded into WP2 tests) |
-| WP7 | trusted-base checker-source mode | DONE (revalidate --checker-source base + Action checker_trust; 10-case exploit matrix) |
-| WP8 | warrant-quality mutation harness | DONE (bench/warrant_quality.py; `dorian bench warrant-quality`; deterministic, offline, never mutates real repo; trigger vs verdict; ERROR bucket distinct; honest scope = structural/existence forms scored, others reported strength-only; 7 tests) |
-| WP9 | current-version benchmark results | TODO |
-| WP10 | V1 release prep / decision | DONE — version surfaces synced to `1.0.0rc1` (pyproject/__init__/uv.lock); no tag/push/publish. All gates pass; adversarial-review BLOCK resolved. |
-
-Branch `dorian-v1-strengthening`, 9 commits off `main`:
-`58b39e2` WP3/4/2/6 · `6a8298c` WP7 · `04ab60b` WP5 · `2a66a49` WP8 · `4e586a7` WP9/WP1 ·
-`2a4befa` byte-match fix · `a6595ba` adversarial-review BLOCK fixes · `b7376e7` bump 1.0.0rc1 ·
-`4710604` benchmark re-stamp.
-
-Adversarial review (5 lenses, BLOCK): 6 must-fixes + 2 hygiene items all resolved with
-regression tests. Final gate: ruff clean, 658 non-slow pass, 733 total (incl slow) green.
diff --git a/action/README.md b/action/README.md
index 558c19e..51b2254 100644
--- a/action/README.md
+++ b/action/README.md
@@ -126,7 +126,7 @@ Hard rules either way:
 | --------------- | -------------------------------------------- | ------------------------------------------------------------------------ |
 | `fail_on`       | `revoked`                                    | when to fail the step: `revoked` (exit 4 only), `degraded` (3 or 4), `never` |
 | `base`          | `${{ github.event.pull_request.base.sha }}`  | git ref passed to `dorian revalidate --since`                            |
-| `install`       | `dorian-vwp`                                 | pip spec; pin `dorian-vwp==0.6.*`, or `.` for checkout installs          |
+| `install`       | `dorian-vwp`                                 | pip spec; until the first PyPI release use the git source spec (below), or `.` for checkout installs |
 | `deny_exec`     | `false`                                      | refuse to run executable checkers (C4 pytest, C5 shell): they ERROR. For untrusted/fork PRs; fail-closed, not a sandbox |
 | `deny_shell`    | `false`                                      | narrower than `deny_exec`: block only C5 shell, still allow C4 pytest    |
 | `checker_trust` | `head`                                       | `head` runs the checked-out checker spec (trusted repos); `base` runs the base-ref spec so PR-authored executable checkers never run (public/fork PRs) |
diff --git a/action/action.yml b/action/action.yml
index 77fabdc..67d5697 100644
--- a/action/action.yml
+++ b/action/action.yml
@@ -23,8 +23,9 @@ inputs:
     default: ${{ github.event.pull_request.base.sha }}
   install:
     description: >-
-      pip requirement spec for dorian. Pin a release ('dorian-vwp==0.6.*')
-      or pass '.' to install the checked-out source.
+      pip requirement spec for dorian. Until the first PyPI release, use a git
+      source spec ('dorian-vwp @ git+https://github.com/ajaysurya1221/dorian.git'),
+      pass '.' to install the checked-out source, or pin a tag once published.
     required: false
     default: dorian-vwp
   deny_exec:
diff --git a/docs/BENCHMARK_BINDING_LIFECYCLE.md b/docs/BENCHMARK_BINDING_LIFECYCLE.md
index 6c51b96..1158b83 100644
--- a/docs/BENCHMARK_BINDING_LIFECYCLE.md
+++ b/docs/BENCHMARK_BINDING_LIFECYCLE.md
@@ -2,7 +2,7 @@
 
 > **HISTORICAL — measured at dorian 0.9.0** (see the run header below; the preserved 808-pair
 > full run). Evidence about the 0.9.0 implementation, not current behavior. The current-version
-> rerun (0.11.0, identical results — see [`BENCHMARK_CURRENT.md`](BENCHMARK_CURRENT.md)) confirms
+> rerun (1.0.0rc1, identical results — see [`BENCHMARK_CURRENT.md`](BENCHMARK_CURRENT.md)) confirms
 > the V1 changes did not regress it. NOTE: `dorian bench binding-lifecycle` REGENERATES this file;
 > restore it from git after a rerun so the historical record survives.
 
diff --git a/docs/BENCHMARK_CURRENT.md b/docs/BENCHMARK_CURRENT.md
index 48cf898..ef3834a 100644
--- a/docs/BENCHMARK_CURRENT.md
+++ b/docs/BENCHMARK_CURRENT.md
@@ -11,15 +11,18 @@ and are kept as-is for provenance.
 | field | value |
 | --- | --- |
 | dorian version | `1.0.0rc1` (V1 release candidate) |
-| measured commit | `b7376e7762571e7c802c220aa50c241d2dae7e39` |
+| metric commit | `b7376e7` (the benchmark figures were measured here) |
+| release commit | the tagged `v1.0.0rc1` commit is a later **docs/release-hygiene only** commit; `git diff b7376e7..<tag> -- src bench` is empty, so the figures apply unchanged |
 | Python | 3.12.4 |
 | platform | darwin (CI matrix: 3.11 / 3.12 / 3.13) |
 | reproduce | `dorian bench large-mutation` · `dorian bench binding-lifecycle` · `dorian bench realworld-usecases` |
 
 These numbers were re-run at the `1.0.0rc1` commit *after* the adversarial-review fixes
-landed, confirming those fixes (py-const type check, `code:` docstring handling, config-key
-stopwords) did not move the benchmark figures — expected, since the suites exercise C1/C3
-(symbol/regex/string/path)/C5, not the new structural/config-binding paths.
+landed AND again during the independent release audit, confirming those fixes (py-const type
+check, `code:` docstring handling, config-key stopwords) did not move the benchmark figures —
+expected, since the suites exercise C1/C3 (symbol/regex/string/path)/C5, not the new
+structural/config-binding paths. Commits between the metric commit and the release tag change
+only docs/release hygiene, never checker or benchmark logic.
 
 ## Results
 
diff --git a/docs/ROADMAP_BACKLOG.md b/docs/ROADMAP_BACKLOG.md
index 85b27b4..f0b916f 100644
--- a/docs/ROADMAP_BACKLOG.md
+++ b/docs/ROADMAP_BACKLOG.md
@@ -125,13 +125,11 @@ before marketing, deterministic verification before AI automation.*
 
 - id: trusted-base-action-mode
   title: Trusted-base Action mode for public fork PRs
-  status: DEFER/HUMAN-REVIEW
+  status: SHIPPED (V1, 1.0.0rc1)
   problem: deny-exec removes code execution but not the self-attested-verdict problem; a real public-fork story needs base-ref checker definitions.
-  evidence: docs/TRUSTED_BASE_ACTION_DESIGN.md (design only).
-  proposed_scope: execute only checker specs present on the trusted base ref; parse/lint (never execute) PR-changed sidecars; deny-exec default for forks; fail-closed; tests simulating a fork sidecar trying to execute shell.
-  why_deferred: Action security defaults are a trust-model change; needs maintainer review and dedicated tests before any public-fork-safe claim.
-  human_review_required: yes  # Action trust model
-  confidence: medium
+  evidence: implemented — revalidate --checker-source base (src/dorian/revalidate.py), Action checker_trust input (action/action.yml), tests/test_trusted_base.py (10-case exploit matrix); see docs/TRUSTED_BASE_ACTION_DESIGN.md (STATUS: IMPLEMENTED).
+  shipped_scope: executes only checker specs resolved from the trusted base ref; PR-added/modified executable checkers never run; missing/tampered base sidecar fails closed; deny-exec composes. Residual (documented, not a sandbox)- a base-approved pytest checker can still execute PR-head code, so pair with deny-exec for untrusted forks.
+  confidence: high
 
 - id: binding-beyond-python-symbols
   title: Bind routes / configs / schemas / non-Python indices
diff --git a/docs/START_HERE.md b/docs/START_HERE.md
index 5a5cfd6..fc1667b 100644
--- a/docs/START_HERE.md
+++ b/docs/START_HERE.md
@@ -38,8 +38,9 @@ exists to catch).
 
 - [`action/README.md`](../action/README.md) — the composite GitHub Action and its **security notes**
   (checker programs are executable; trusted repos only).
-- [`TRUSTED_BASE_ACTION_DESIGN.md`](TRUSTED_BASE_ACTION_DESIGN.md) — design (not yet implemented) for a
-  trusted-base Action mode that executes only base-branch checker specs.
+- [`TRUSTED_BASE_ACTION_DESIGN.md`](TRUSTED_BASE_ACTION_DESIGN.md) — the trusted-base Action mode
+  (`revalidate --checker-source base` / Action `checker_trust: base`), **implemented in V1**: it
+  executes only base-branch checker specs (a trust root, not a sandbox) for public/fork PRs.
 
 ## I want the why and the roadmap
 
diff --git a/src/dorian/checkers/c3_ref.py b/src/dorian/checkers/c3_ref.py
index fe06e35..7c6b7ca 100644
--- a/src/dorian/checkers/c3_ref.py
+++ b/src/dorian/checkers/c3_ref.py
@@ -16,12 +16,18 @@
                                 documented ceiling — only a C4 test catches that.
 - py-const:<file>::<qualname>::<literal>       structural (Python AST): the named
                                 module/class assignment has the stated LITERAL value
-                                (compared by value, so quote style / int base / spacing
-                                are tolerated, and a comment/docstring mention cannot
-                                pass). FAIL on a value drift; ERROR on a non-literal RHS.
-
-The `py-*` structural forms parse the file's AST (`dorian.pyast`); they read only and
-never execute the target. See `dorian/pyast.py` and `spec/checkers.md`.
+                                (compared by value AND type, so quote style / int base /
+                                spacing are tolerated but 30 != 30.0 and 1 != True, and a
+                                comment/docstring mention cannot pass). FAIL on a value
+                                drift; ERROR on a non-literal RHS.
+- code:<file>::<pattern>        semantic regex (Python-only): re.search over the file with
+                                comments and docstrings BLANKED (a fact surviving only in a
+                                comment/docstring FAILs; real string literals are kept).
+                                Same 500-char cap + worker-process timeout as `regex:`;
+                                ERROR('code_unparseable') on a non-parseable / non-Python target.
+
+The `py-*` structural and `code:` semantic forms parse the file's AST (`dorian.pyast`);
+they read only and never execute the target. See `dorian/pyast.py` and `spec/checkers.md`.
 
 `regex:` is the shape-tolerant form: prefer it over `string:` for facts that must
 survive reformatting (the v0.0 false-positive class — e.g. 'TIMEOUT\\s*=\\s*30'
diff --git a/tests/test_action_security_defaults.py b/tests/test_action_security_defaults.py
index f369aff..a27a3d1 100644
--- a/tests/test_action_security_defaults.py
+++ b/tests/test_action_security_defaults.py
@@ -74,3 +74,30 @@ def test_security_docs_reflect_trusted_base_as_implemented() -> None:
         assert "not yet a full public-fork story" not in low
         # the actual feature is named (Action input and/or CLI flag)
         assert "checker_trust" in doc or "checker-source" in doc
+
+
+def test_no_live_doc_calls_trusted_base_unimplemented() -> None:
+    """Release-audit regression: trusted-base shipped in V1, so NO live doc may still
+    describe it as unimplemented/design-only. A prior pass fixed SECURITY.md and
+    action/README.md but missed START_HERE.md and ROADMAP_BACKLOG.md — this scans every
+    live doc (README, SECURITY, action README, docs/*.md). Archival change-notes
+    (docs/changes/) and history (docs/history/) are dated snapshots, intentionally excluded
+    (docs/*.md does not recurse into them)."""
+    import re
+
+    stale = re.compile(
+        r"not\s+(yet\s+)?implemented|design[- ]only|\(not implemented\)", re.IGNORECASE
+    )
+    live = [REPO_ROOT / "README.md", REPO_ROOT / "SECURITY.md", ACTION_README]
+    live += sorted((REPO_ROOT / "docs").glob("*.md"))
+    offenders = []
+    for path in live:
+        if not path.is_file():
+            continue
+        for i, line in enumerate(path.read_text(encoding="utf-8").splitlines(), 1):
+            low = line.lower()
+            if ("trusted-base" in low or "trusted base" in low) and stale.search(line):
+                offenders.append(f"{path.relative_to(REPO_ROOT)}:{i}: {line.strip()}")
+    assert not offenders, "live doc(s) still call trusted-base unimplemented:\n" + "\n".join(
+        offenders
+    )
diff --git a/tests/test_benchmark_evidence.py b/tests/test_benchmark_evidence.py
index 640e955..30dd6d0 100644
--- a/tests/test_benchmark_evidence.py
+++ b/tests/test_benchmark_evidence.py
@@ -34,12 +34,14 @@ def test_historical_benchmark_docs_are_labeled_historical() -> None:
 
 
 def test_current_benchmark_doc_is_version_and_commit_stamped() -> None:
+    import re
     import tomllib
 
     doc = _read("docs/BENCHMARK_CURRENT.md")
     version = tomllib.loads(_read("pyproject.toml"))["project"]["version"]
     assert version in doc, f"current benchmark doc must stamp the live version {version!r}"
-    assert "measured commit" in doc.lower()
+    # must be commit-stamped (metric/release commit) — accept any 7+ hex SHA reference
+    assert "commit" in doc.lower() and re.search(r"\b[0-9a-f]{7,40}\b", doc)
     assert "Python" in doc  # environment summary
     assert "reproduce" in doc.lower()
     # the mandatory non-overclaim block
diff --git a/tests/test_warrant_quality.py b/tests/test_warrant_quality.py
index b3bebe7..de7b479 100644
--- a/tests/test_warrant_quality.py
+++ b/tests/test_warrant_quality.py
@@ -124,6 +124,31 @@ def test_deterministic_output(wq, tmp_path: Path) -> None:
     assert (repo / "src/config.py").read_text() == "TIMEOUT = 30\n"
 
 
+def test_path_escape_operand_does_not_escape_sandbox(wq, tmp_path: Path) -> None:
+    """A warrant-controlled `../`-escaping file operand must not make the harness read or
+    write outside the repo/temp sandbox — the module docstring advertises containment."""
+    from dorian.model import CheckerSpec, Claim
+
+    repo = _repo(tmp_path)
+    # a file OUTSIDE the repo the mutation must never read or rewrite
+    outside = tmp_path / "outside.py"
+    outside.write_text("LIMIT = 5\n", encoding="utf-8")
+    outside_mtime = outside.stat().st_mtime
+    claim = Claim(
+        id="esc",
+        text="x",
+        kind="quantity",
+        load_bearing=True,
+        checkers=(CheckerSpec(type="C3", program="py-const:../outside.py::LIMIT::5"),),
+    )
+    verdict = wq._run_mutated(
+        repo, claim, 0, "../outside.py", lambda t: t + "\nX=1\n", ExecutionPolicy()
+    )
+    assert verdict.value == "ERROR"  # refused, not run against the out-of-repo file
+    assert outside.read_text(encoding="utf-8") == "LIMIT = 5\n"  # untouched
+    assert outside.stat().st_mtime == outside_mtime  # not even rewritten
+
+
 def test_cli_smoke_json(wq, tmp_path: Path, capsys) -> None:
     from dorian import cli
 

From 24ae7c82bcd6f1bffc273f8fb2a6bf4ab2ffc035 Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 20:13:29 +0530
Subject: [PATCH 12/13] audit: restamp BENCHMARK_CURRENT metric commit to the
 audited release commit

The benchmarks were re-run during the release audit at 33e9eaf and are identical
(large-mutation P=R=0.93, 11.6x/10.4x; binding-lifecycle 808 pairs 0.54->1.00,
precision/recall 1.00; realworld 2/1/2). Stamp the metric commit as 33e9eaf; the
tagged release commit is only this docs re-stamp (git diff 33e9eaf..HEAD -- src bench
is empty). Fixes the earlier note which referenced b7376e7 and predated the c3_ref
docstring edit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/BENCHMARK_CURRENT.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/BENCHMARK_CURRENT.md b/docs/BENCHMARK_CURRENT.md
index ef3834a..a26d3df 100644
--- a/docs/BENCHMARK_CURRENT.md
+++ b/docs/BENCHMARK_CURRENT.md
@@ -11,8 +11,8 @@ and are kept as-is for provenance.
 | field | value |
 | --- | --- |
 | dorian version | `1.0.0rc1` (V1 release candidate) |
-| metric commit | `b7376e7` (the benchmark figures were measured here) |
-| release commit | the tagged `v1.0.0rc1` commit is a later **docs/release-hygiene only** commit; `git diff b7376e7..<tag> -- src bench` is empty, so the figures apply unchanged |
+| metric commit | `33e9eaf` (the benchmark figures were measured here, during the release audit) |
+| release commit | the tagged `v1.0.0rc1` commit is the immediate docs-only re-stamp of this file over the metric commit; `git diff 33e9eaf..<tag> -- src bench` is empty, so the figures apply unchanged |
 | Python | 3.12.4 |
 | platform | darwin (CI matrix: 3.11 / 3.12 / 3.13) |
 | reproduce | `dorian bench large-mutation` · `dorian bench binding-lifecycle` · `dorian bench realworld-usecases` |

From 79136d57b6c0fe9970de1e3bea6d92cf569a565e Mon Sep 17 00:00:00 2001
From: Ajay Surya <ajaysurya.senthilrajan@redica.com>
Date: Mon, 15 Jun 2026 20:50:07 +0530
Subject: [PATCH 13/13] docs(readme): reflect shipped trusted-base + v1.0.0rc1
 tag

- intro blockquote: note checker_trust: base as the public/fork trust root (still not a
  sandbox), instead of flatly "not public CI for forked PRs" now that trusted-base shipped.
- roadmap: "tagged release" is done (v1.0.0rc1 prerelease); only PyPI trusted publishing remains.

Post-tag branch update (the v1.0.0rc1 tag stays frozen at 24ae7c8); folds into the next
tag / the PR to main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 README.md | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 06c59cf..d58973b 100644
--- a/README.md
+++ b/README.md
@@ -36,8 +36,9 @@ now and is re-checked on every future change, so a confident summary doesn't qui
 > commits, nothing else — with **zero model tokens at check time**, so the checker can't be talked
 > past by the code it verifies. Because checker programs are *executable* (C4 runs `pytest`, C5
 > `shell:` runs a command), it is built for **trusted, internal repositories** — not public CI
-> taking forked pull requests. Pairs naturally with a coding agent such as **Claude Code**
-> ([how](#using-dorian-with-claude-code)).
+> taking forked pull requests by default (for public/fork PRs, `checker_trust: base` runs only
+> base-approved checker specs — a trust root, still not a sandbox). Pairs naturally with a coding
+> agent such as **Claude Code** ([how](#using-dorian-with-claude-code)).
 
 ## Table of contents
 
@@ -475,7 +476,8 @@ work perishable, so you find out when it expired.
   ([`docs/REALWORLD_USECASES.md`](docs/REALWORLD_USECASES.md)) reproduce real problem *classes*; the
   next rung is frozen public-repo SHAs with manual claims and reproducible known-truth labels
   ([`docs/SOLO_VALIDATION_LADDER.md`](docs/SOLO_VALIDATION_LADDER.md)).
-- **Tagged release and PyPI trusted publishing.**
+- **PyPI trusted publishing** — tagged releases now ship (latest: **`v1.0.0rc1`**, a V1 release
+  candidate / prerelease); publishing `dorian-vwp` to PyPI via a Trusted Publisher is next.
 
 Non-goals stay non-goals: no servers, no dashboards, no hosted control plane, no model at check time.
 Local-first is the design center.