dorian V1 strengthening (1.0.0rc1) by ajaysurya1221 · Pull Request #6 · ajaysurya1221/dorian

ajaysurya1221 · 2026-06-15T15:25:27Z

Summary

V1 strengthening program (from v0.11.0), released as the prerelease tag v1.0.0rc1 and
independently release-audited (FIXED_PASS). All additions are additive and backward-compatible.

Python structural checkers py-signature: / py-const: (AST; close the symbol: existence
and string:/regex: comment-survival ceilings; py-const: compares value and type).
Semantic-context code: — regex over comment/docstring-stripped Python.
Checker-strength / claim-risk diagnostics in dorian bindings (+ C4 adequacy lint); advisory.
Multi-index binding — config keys in tracked .toml/.json (zero runtime deps; YAML excluded).
Trusted-base revalidate --checker-source base / Action checker_trust: base — base-approved
checker specs only for public/fork PRs (a trust root, not a sandbox).
dorian bench warrant-quality — offline per-claim mutation scoring.
Docs: V1_SCOPE.md, version-stamped BENCHMARK_CURRENT.md, historical benchmark labels.

Two adversarial reviews (implementation BLOCK → 6 must-fixes; independent release audit
FIXED_PASS → 2 doc-drift blockers + should-fixes) all resolved with regression tests.

Invariants preserved: ERROR ≠ BROKEN; checkers read-only (except C4/C5-shell); binding is
trigger-only; zero runtime dependencies. Release candidate, not final 1.0.0 (real-repo
public micro-benchmark + RC caveats remain post-V1; see docs/V1_SCOPE.md).

Test Plan

CI matrix (Python 3.11 / 3.12 / 3.13) green on this PR
Local at the release commit: uv run pytest → 735 passed (incl. slow); ruff clean; uv build +
clean-venv install → dorian 1.0.0rc1; benchmarks reproduce unchanged; trusted-base exploit
matrix (10 cases) passes.

Summary by CodeRabbit

New Features
- Python structural checkers: signature validation, constant value verification, semantic code matching
- Trusted-base mode for secure pull request validation
- Automatic config-key binding from TOML/JSON files
- Warrant-quality benchmark harness
- Checker strength and claim risk diagnostics
Documentation
- V1 scope and boundaries documented
- Updated security guidance for untrusted sources
- Versioned benchmark results published

…ength diagnostics WP3/WP4/WP2/WP6 of the v0.11.0 -> V1 strengthening program (research-report driven). - C3 py-signature:/py-const: structural checkers (dorian/pyast.py): AST-based, close the symbol-existence and string/regex comment-survival ceilings; gutted-body remains the documented ceiling (only C4 catches a body change behind an unchanged signature). - C3 code: semantic-context regex over comment/docstring-stripped Python (same ReDoS worker-timeout as regex:), so a fact surviving only in a comment/docstring FAILs. - checker-strength / claim-risk diagnostics (dorian/strength.py): classify truth strength per checker, flag kind-vs-strength adequacy mismatches, advisory C4 zero/constant-assertion lint; surfaced in `dorian bindings` (JSON + human) and the opt-in --binding-gate warn output. Advisory only — never a verdict/trust/exit change. - watch derivation + binding diagnostics recognize the new C3 forms (seal, bindings). - spec/checkers.md + docs/AGENT_CLAIMS.md document the new grammars and the ceiling. +58 tests (test_pystructural, test_semantic_context, test_strength); 619 non-slow pass. ERROR-vs-FAIL discipline preserved; trigger-vs-truth split made explicit, not blurred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…(WP7) revalidate --checker-source {head,base} (default head) + Action checker_trust input. base mode resolves each candidate claim's checker SPEC from the --since (base) ref and runs it against PR-head sources, so a PR-added or PR-modified executable (C4/C5 shell) checker is never executed and a rewritten checker cannot self-attest a verdict (base spec wins; the change is surfaced as a trust-root note). Fail-closed: a missing/tampered base sidecar ERRORs (never executed), never BROKEN, never green. Composes with deny-exec. NOT a sandbox — a base-approved pytest checker can still run head code, stated in every surface. - revalidate.py: checker_source param, _load_base_warrant (integrity-checked base sidecar via gitio.file_at_ref), RevalResult.notes, text/md rendering of notes. - cli.py/commands.py: --checker-source flag + DORIAN_CHECKER_SOURCE env fallback; base requires --since. - action.yml: checker_trust input (default head) -> DORIAN_CHECKER_SOURCE; README + Inputs table updated (also documents the pre-existing deny_exec/deny_shell inputs). - docs: TRUSTED_BASE_ACTION_DESIGN status -> IMPLEMENTED; SECURITY_BOUNDARY public-fork checklist updated (trust-root conditions met; sandboxing still out of scope). - tests/test_trusted_base.py: the §6 exploit matrix (10 cases) — each "executed?" case proven by a sentinel touch that must NOT appear under base mode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Extends binding beyond Python definers/console-scripts to config keys in tracked .toml/.json files: a claim mentioning a config key is re-checked when the defining config file changes. Conservative and trigger-only (never proves truth): - symbol_index.config_key_index: key -> tracked .toml/.json files + unparseable list. YAML deliberately excluded (parsing needs a third-party dep; core stays zero-dep). - claim_config_watch_paths + claim_watch_paths (unified symbol+script+config union); verify/rebind now widen with the merged watch set. - ambiguous_config_mentions: a key in >1 file is skipped (a wrong watch is a false alarm) and surfaced via verify warnings + bind-suggest, never guessed. - unparseable supported config files are surfaced as a diagnostic, never silent. - bind-suggest gains provenance (bind (symbol) vs bind (config)) + ambiguous-config + unparseable-config lines; JSON adds bind_config/ambiguous_config/unparseable_config. - config_key_index degrades to empty on a non-git repo (never blocks). Updated test_symbol_index pyproject-script expectation (a script-name claim now also watches pyproject.toml where the script is declared) and the trusted-base design doc-guard (now IMPLEMENTED). 639 non-slow tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…lity (WP8) Offline, per-claim evidence: for each claim it derives deterministic mutations from the checker grammar and records whether the verdict matched expectation — - falsify (rename symbol / reassign const / change param): expect FAIL; a PASS is a MISS; - benign (trailing comment): expect PASS; a FAIL is BRITTLE (false alarm); - ceiling (content drift keeping an existence symbol): expect PASS, recorded as the documented trigger-vs-truth ceiling, never a penalty. ERROR (e.g. an executable checker under --deny-exec) is its own bucket, never a miss. Output is deterministic (no timestamps/randomness) and never mutates the real repo — each mutation runs against a throwaway copy of only the file the checker reads. Honest scope: structural/existence C3 forms are mutation-scored; string/regex/code, typed C5, C1, C4 are reported with strength and `mutation: unsupported` (no fabricated mutation). Registered as the `warrant-quality` bench subcommand. 7 tests; 645 non-slow pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…WP1) - docs/BENCHMARK_CURRENT.md: version- and commit-stamped reruns of the reproducible suites on current code — large-mutation (240 pairs, P=R=0.93, 11.6x/10.4x FP reduction), binding-lifecycle (808 pairs, selection recall 0.54->1.00, alarm precision/recall 1.00, 0 errored), realworld (5 cases 2/1/2), and the new warrant-quality harness. The reruns MATCH the historical runs (same content-derived run_id), proving the V1 changes are additive and do not regress the benchmarks. Includes a what-this-does-NOT-prove block. - HISTORICAL banners on docs/BENCHMARK_v0.7.0.md (v0.7.0) and docs/BENCHMARK_BINDING_LIFECYCLE.md (0.9.0), each pointing to BENCHMARK_CURRENT.md; the historical numbers are preserved verbatim. - docs/V1_SCOPE.md: what V1 strengthening means and does NOT mean (no universal semantic correctness; trusted-base is a trust root not a sandbox; config binding is TOML/JSON only; code:/structural are Python-only; extractor stays draft; carried-forward limitations). - README: trust-state legend (WARRANTED born -> TRUSTED/DEGRADED/REVOKED/UNKNOWN), historical labels on the benchmark citations, command-surface entries for the new C3 forms, checker-strength in bindings, config provenance in bind-suggest, checker-source base, and bench warrant-quality. - tests/test_benchmark_evidence.py: wording guards (historical docs labeled; current doc version/commit-stamped with a non-overclaim block; README links current; V1_SCOPE boundary). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test_large_mutation::test_committed_doc_matches_render asserts docs/BENCHMARK_v0.7.0.md == lm.render_markdown(summary), so the generated doc cannot carry a hand-added banner. Drop the HISTORICAL banner from it (the title already version-stamps it "(v0.7.0)"); its historical status is conveyed by README + BENCHMARK_CURRENT.md (which names it as the historical source). The binding-lifecycle doc has no byte-match guard, so it keeps its banner. Updated test_benchmark_evidence to match: binding-lifecycle by banner, v0.7.0 by version-stamped title + the current doc's cross-reference. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…items A 5-lens adversarial review (BLOCK verdict) reproduced 6 real defects; all fixed red-green, plus two hygiene items that contradicted stated invariants: 1. Config-key over-binding broke "default unchanged unless opt-in": a claim backticking a common config word (e.g. `dependencies`) bound pyproject.toml and could newly refuse a clean `verify` with exit 6. Fix: _CONFIG_KEY_STOPWORDS (PEP 621 / common keys) on the config axis; specific keys (max_workers) still bind. Regression test reproduces the exit-6. 2/3. SECURITY.md + action/README.md still said trusted-base was "not yet implemented" — false on this branch. Updated both to describe checker_trust: base as shipped (with the non-sandbox residual); added a guard test so the drift can't recur. 4. `code:` false PASS on an f-string docstring — code_only_python now recognises ast.JoinedStr docstrings. 5. `code:` false FAIL on a real string co-located on a docstring's line — docstrings are now blanked by AST node SPAN, not whole line. 6. `py-const` PASSed on value-TYPE drift (30/30.0, 1/True, 0/False) via Python == — now requires matching type before ==. Documented + red-green tested. Hygiene: warrant-quality _run_mutated refuses a `../`-escaping file operand (its docstring promised containment); check_signature wraps comparison in _PARSE_ERRORS so a pathological signature ERRORs within pyast. Added an end-to-end ERROR-never-BROKEN test for the new C3 forms (non-literal RHS -> ERRORED, exit 5, never BROKEN). 658 non-slow tests pass; lint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

All V1 strengthening work packages (WP1-WP9) are implemented, tested, and documented; the 5-lens adversarial review's BLOCK findings are all resolved with regression tests; 733 tests pass (incl. slow); lint clean. Bump the three version surfaces (pyproject / __init__ / uv.lock) to the V1 release candidate. No tag, push, or publish. rc1 (not final 1.0.0) is honest: the candidate invites real-repo benchmark validation and the explicitly-deferred post-V1 items (declarative-structural checkers, route/SQL binding indices, YAML config binding, audit-event atomicity) documented in docs/V1_SCOPE.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…st-fix) Re-ran large-mutation / binding-lifecycle / realworld at commit b7376e7 (1.0.0rc1), after the adversarial-review fixes: figures identical (large-mutation P=R=0.93, 11.6x/10.4x; binding-lifecycle 808 pairs 0.54->1.00 selection, 1.00 alarm; realworld 2/1/2), confirming the fixes don't touch the benchmarked paths. Version/commit stamps updated; the version-stamp evidence test now reads the live pyproject version. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Final evidence-backed report: version gate, per-WP status, commands+results, verification evidence, trigger-vs-truth preservation, security posture, benchmark posture, remaining risks/non-goals, and the release decision (1.0.0rc1 candidate; no tag/push/publish). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Independent release audit (FIXED_NEEDED) findings, all repaired: Blockers: - docs/START_HERE.md still called trusted-base "(not yet implemented)" — a user-facing CI entry-point doc the prior fix missed; now describes it as implemented (V1). - docs/BENCHMARK_BINDING_LIFECYCLE.md banner said the current rerun was "0.11.0" while the branch is 1.0.0rc1 (and BENCHMARK_CURRENT says 1.0.0rc1) — corrected the version. - internal program docs (V1_IMPLEMENTATION_TRACKER.md, V1_ALIGNMENT_REPORT.md) were tracked; gitignored + git rm --cached (kept on disk as provenance). Also gitignore the research report, audit gate, release notes, and tool dirs (.claude/, .gitnexus/). docs/V1_SCOPE.md stays tracked (it is a public doc). Should-fixes: - docs/ROADMAP_BACKLOG.md trusted-base item flipped DEFER/HUMAN-REVIEW -> SHIPPED (V1). - c3_ref.py module docstring now documents the code: form (was omitted) and the py-const value-AND-type rule. - action.yml / action/README.md drop the stale 'dorian-vwp==0.6.*' pin example (no PyPI release yet) for the git source spec. - docs/BENCHMARK_CURRENT.md labels the metric commit vs the (docs-only) release commit. Hardened tests: test_no_live_doc_calls_trusted_base_unimplemented scans ALL live docs (not just SECURITY.md/action README); warrant-quality path-escape test pins the containment guard; benchmark-evidence commit-stamp check is version-agnostic. 660 non-slow tests pass; lint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… commit The benchmarks were re-run during the release audit at 33e9eaf and are identical (large-mutation P=R=0.93, 11.6x/10.4x; binding-lifecycle 808 pairs 0.54->1.00, precision/recall 1.00; realworld 2/1/2). Stamp the metric commit as 33e9eaf; the tagged release commit is only this docs re-stamp (git diff 33e9eaf..HEAD -- src bench is empty). Fixes the earlier note which referenced b7376e7 and predated the c3_ref docstring edit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- intro blockquote: note checker_trust: base as the public/fork trust root (still not a sandbox), instead of flatly "not public CI for forked PRs" now that trusted-base shipped. - roadmap: "tagged release" is done (v1.0.0rc1 prerelease); only PyPI trusted publishing remains. Post-tag branch update (the v1.0.0rc1 tag stays frozen at 24ae7c8); folds into the next tag / the PR to main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-15T15:25:41Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR releases dorian v1.0.0rc1, adding five V1 features: Python-AST structural C3 checker forms (py-signature:, py-const:, code:); trusted-base checker-source mode for fork PR security; multi-index TOML/JSON config-key binding; advisory checker-strength/risk diagnostics; and an offline warrant-quality mutation harness. The version is bumped from 0.11.0 to 1.0.0rc1 with corresponding documentation and security boundary updates.

Changes

dorian V1 Feature Set

Layer / File(s)	Summary
Python AST structural C3 checkers `src/dorian/pyast.py`, `src/dorian/checkers/c3_ref.py`, `src/dorian/seal.py`, `src/dorian/bindings.py`, `spec/checkers.md`, `docs/AGENT_CLAIMS.md`, `tests/test_pystructural.py`, `tests/test_semantic_context.py`	Adds `pyast.py` implementing `code_only_python`, `check_signature`, and `check_const` (AST-based, no execution); extends `c3_ref.check()` with `py-signature:`, `py-const:`, and `code:` (comments/docstrings stripped) branches; centralizes file-operand prefix detection in `seal.py` via `_C3_FILE_OPERAND_FORMS`; propagates constant to `bindings.py`; spec and full test coverage added.
Trusted-base checker-source mode `src/dorian/revalidate.py`, `src/dorian/cli.py`, `src/dorian/commands.py`, `action/action.yml`, `action/README.md`, `SECURITY.md`, `docs/SECURITY_BOUNDARY.md`, `docs/TRUSTED_BASE_ACTION_DESIGN.md`, `tests/test_trusted_base.py`, `tests/test_action_security_defaults.py`, `tests/test_claude_code_docs.py`, `tests/test_render_md.py`	`revalidate()` gains `checker_source` parameter; `_load_base_warrant()` loads and integrity-verifies base sidecars with fail-closed caching; PR-added/modified executable checkers are skipped under `"base"` mode; `RevalResult` gains a `notes` field surfaced in text/markdown renders; `--checker-source` CLI option and `DORIAN_CHECKER_SOURCE` env added; `checker_trust` input wired into Action via `DORIAN_CHECKER_SOURCE`; security docs updated from "not implemented" to shipped V1; full end-to-end and CLI tests added.
Multi-index config-key binding `src/dorian/symbol_index.py`, `src/dorian/commands.py`, `tests/test_config_binding.py`, `tests/test_symbol_index.py`	Adds `config_key_index`, `claim_config_watch_paths`, `claim_watch_paths`, and `ambiguous_config_mentions` to `symbol_index.py`; updates `cmd_verify`, `cmd_bind_suggest`, and `cmd_rebind` to use unified multi-index watch paths; stopword set prevents over-binding on common PEP 621 keys; comprehensive test suite including end-to-end revocation on config drift.
Checker strength and claim risk diagnostics `src/dorian/strength.py`, `src/dorian/commands.py`, `tests/test_strength.py`	New `strength.py` computes truth-strength for C1/C3/C4/C5 checkers, detects kind-vs-strength adequacy mismatches, statically lints C4 test nodes for vacuous assertions, and rolls binding flags into per-claim `high/medium/low` risk; integrated into `_emit_binding_gate_warnings` and `cmd_bindings` output; full test suite with CLI integration test.
Warrant-quality offline mutation harness `bench/warrant_quality.py`, `src/dorian/commands.py`, `tests/test_warrant_quality.py`	Adds `bench/warrant_quality.py` implementing deterministic C3 mutation generation (`symbol:`, `py-const:`, `py-signature:`), sandboxed mutation running (path-escape blocked), verdict-to-outcome mapping (`caught/missed/brittle/ok/ceiling`), claim quality scoring (`weak/brittle/strong/unscored`), and JSON/text output; registered as `dorian bench warrant-quality`; tests include sandbox-containment and CLI smoke test.
Version bump, benchmark docs, and V1 scope `pyproject.toml`, `src/dorian/__init__.py`, `docs/V1_SCOPE.md`, `docs/BENCHMARK_CURRENT.md`, `docs/BENCHMARK_BINDING_LIFECYCLE.md`, `docs/ROADMAP_BACKLOG.md`, `docs/START_HERE.md`, `README.md`, `.gitignore`, `tests/test_benchmark_evidence.py`	Version bumped to `1.0.0rc1`; new `V1_SCOPE.md` enumerates V1 capabilities and non-goals; `BENCHMARK_CURRENT.md` documents current benchmark environment and results; historical benchmark docs labeled; `README.md` updated with trust state semantics, command surface changes, and benchmark historical references; benchmark hygiene enforced via `test_benchmark_evidence.py`.

Sequence Diagram(s)

sequenceDiagram
  participant PR as PR CI (Action)
  participant cmd as commands.cmd_revalidate
  participant reval as revalidate()
  participant base as _load_base_warrant()
  participant git as git base ref
  participant checker as _check_claim()

  PR->>cmd: checker_trust=base, since=base_sha
  cmd->>reval: checker_source="base", since=base_sha
  loop per artifact claim
    reval->>base: artifact_uri, base_sha
    base->>git: read .warrant sidecar at base_sha
    git-->>base: raw bytes or FileNotFoundError
    base-->>reval: Warrant | None
    alt base warrant missing or tampered
      reval-->>reval: claim → ERRORED, no execution
    else PR added/modified executable checker
      reval-->>reval: record trust-root note, use base spec
      reval->>checker: base-approved CheckerSpec
      checker-->>reval: PASS | FAIL | ERROR
    else spec unchanged
      reval->>checker: base spec
      checker-->>reval: PASS | FAIL | ERROR
    end
  end
  reval-->>cmd: RevalResult(notes=[...])
  cmd-->>PR: exit code + sticky PR comment

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

ajaysurya1221/dorian#4: Directly modifies src/dorian/bindings.py's _checker_named_files C3 branch, the same function updated here to use _C3_FILE_OPERAND_FORMS.
ajaysurya1221/dorian#3: Introduces referenced_paths in src/dorian/seal.py, the same function updated here to use _C3_FILE_OPERAND_FORMS for expanded C3 operand handling.

Poem

🐇 A rabbit hops through AST trees,
Checking signatures with greatest of ease!
Base warrants guarded, no forks may deceive,
Config keys indexed, so watchers don't grieve.
Mutations measured, the warrant stands strong—
V1 has shipped, and the wait wasn't long! 🎉

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dorian-v1-strengthening

ajay-dev-2112 and others added 13 commits June 15, 2026 17:29

ajaysurya1221 merged commit c886610 into main Jun 15, 2026
3 of 4 checks passed

coderabbitai Bot mentioned this pull request Jun 17, 2026

release: dorian v1.0.1 — hardening, DX, interop + first real cross-PR catch #8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dorian V1 strengthening (1.0.0rc1)#6

dorian V1 strengthening (1.0.0rc1)#6
ajaysurya1221 merged 13 commits into
mainfrom
dorian-v1-strengthening

ajaysurya1221 commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ajaysurya1221 commented Jun 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ajaysurya1221 commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading