Skip to content

dorian V1 strengthening (1.0.0rc1)#6

Merged
ajaysurya1221 merged 13 commits into
mainfrom
dorian-v1-strengthening
Jun 15, 2026
Merged

dorian V1 strengthening (1.0.0rc1)#6
ajaysurya1221 merged 13 commits into
mainfrom
dorian-v1-strengthening

Conversation

@ajaysurya1221

@ajaysurya1221 ajaysurya1221 commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Summary

V1 strengthening program (from v0.11.0), released as the prerelease tag v1.0.0rc1 and
independently release-audited (FIXED_PASS). All additions are additive and backward-compatible.

  • Python structural checkers py-signature: / py-const: (AST; close the symbol: existence
    and string:/regex: comment-survival ceilings; py-const: compares value and type).
  • Semantic-context code: — regex over comment/docstring-stripped Python.
  • Checker-strength / claim-risk diagnostics in dorian bindings (+ C4 adequacy lint); advisory.
  • Multi-index binding — config keys in tracked .toml/.json (zero runtime deps; YAML excluded).
  • Trusted-base revalidate --checker-source base / Action checker_trust: base — base-approved
    checker specs only for public/fork PRs (a trust root, not a sandbox).
  • dorian bench warrant-quality — offline per-claim mutation scoring.
  • Docs: V1_SCOPE.md, version-stamped BENCHMARK_CURRENT.md, historical benchmark labels.

Two adversarial reviews (implementation BLOCK → 6 must-fixes; independent release audit
FIXED_PASS → 2 doc-drift blockers + should-fixes) all resolved with regression tests.

Invariants preserved: ERROR ≠ BROKEN; checkers read-only (except C4/C5-shell); binding is
trigger-only; zero runtime dependencies. Release candidate, not final 1.0.0 (real-repo
public micro-benchmark + RC caveats remain post-V1; see docs/V1_SCOPE.md).

Test Plan

  • CI matrix (Python 3.11 / 3.12 / 3.13) green on this PR
  • Local at the release commit: uv run pytest → 735 passed (incl. slow); ruff clean; uv build +
    clean-venv install → dorian 1.0.0rc1; benchmarks reproduce unchanged; trusted-base exploit
    matrix (10 cases) passes.

Summary by CodeRabbit

  • New Features

    • Python structural checkers: signature validation, constant value verification, semantic code matching
    • Trusted-base mode for secure pull request validation
    • Automatic config-key binding from TOML/JSON files
    • Warrant-quality benchmark harness
    • Checker strength and claim risk diagnostics
  • Documentation

    • V1 scope and boundaries documented
    • Updated security guidance for untrusted sources
    • Versioned benchmark results published

ajay-dev-2112 and others added 13 commits June 15, 2026 17:29
…ength diagnostics

WP3/WP4/WP2/WP6 of the v0.11.0 -> V1 strengthening program (research-report driven).

- C3 py-signature:/py-const: structural checkers (dorian/pyast.py): AST-based, close
  the symbol-existence and string/regex comment-survival ceilings; gutted-body remains
  the documented ceiling (only C4 catches a body change behind an unchanged signature).
- C3 code: semantic-context regex over comment/docstring-stripped Python (same ReDoS
  worker-timeout as regex:), so a fact surviving only in a comment/docstring FAILs.
- checker-strength / claim-risk diagnostics (dorian/strength.py): classify truth
  strength per checker, flag kind-vs-strength adequacy mismatches, advisory C4
  zero/constant-assertion lint; surfaced in `dorian bindings` (JSON + human) and the
  opt-in --binding-gate warn output. Advisory only — never a verdict/trust/exit change.
- watch derivation + binding diagnostics recognize the new C3 forms (seal, bindings).
- spec/checkers.md + docs/AGENT_CLAIMS.md document the new grammars and the ceiling.

+58 tests (test_pystructural, test_semantic_context, test_strength); 619 non-slow pass.
ERROR-vs-FAIL discipline preserved; trigger-vs-truth split made explicit, not blurred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(WP7)

revalidate --checker-source {head,base} (default head) + Action checker_trust input.
base mode resolves each candidate claim's checker SPEC from the --since (base) ref and
runs it against PR-head sources, so a PR-added or PR-modified executable (C4/C5 shell)
checker is never executed and a rewritten checker cannot self-attest a verdict (base
spec wins; the change is surfaced as a trust-root note). Fail-closed: a missing/tampered
base sidecar ERRORs (never executed), never BROKEN, never green. Composes with
deny-exec. NOT a sandbox — a base-approved pytest checker can still run head code, stated
in every surface.

- revalidate.py: checker_source param, _load_base_warrant (integrity-checked base
  sidecar via gitio.file_at_ref), RevalResult.notes, text/md rendering of notes.
- cli.py/commands.py: --checker-source flag + DORIAN_CHECKER_SOURCE env fallback;
  base requires --since.
- action.yml: checker_trust input (default head) -> DORIAN_CHECKER_SOURCE; README + Inputs
  table updated (also documents the pre-existing deny_exec/deny_shell inputs).
- docs: TRUSTED_BASE_ACTION_DESIGN status -> IMPLEMENTED; SECURITY_BOUNDARY public-fork
  checklist updated (trust-root conditions met; sandboxing still out of scope).
- tests/test_trusted_base.py: the §6 exploit matrix (10 cases) — each "executed?" case
  proven by a sentinel touch that must NOT appear under base mode.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extends binding beyond Python definers/console-scripts to config keys in tracked
.toml/.json files: a claim mentioning a config key is re-checked when the defining
config file changes. Conservative and trigger-only (never proves truth):

- symbol_index.config_key_index: key -> tracked .toml/.json files + unparseable list.
  YAML deliberately excluded (parsing needs a third-party dep; core stays zero-dep).
- claim_config_watch_paths + claim_watch_paths (unified symbol+script+config union);
  verify/rebind now widen with the merged watch set.
- ambiguous_config_mentions: a key in >1 file is skipped (a wrong watch is a false
  alarm) and surfaced via verify warnings + bind-suggest, never guessed.
- unparseable supported config files are surfaced as a diagnostic, never silent.
- bind-suggest gains provenance (bind (symbol) vs bind (config)) + ambiguous-config
  + unparseable-config lines; JSON adds bind_config/ambiguous_config/unparseable_config.
- config_key_index degrades to empty on a non-git repo (never blocks).

Updated test_symbol_index pyproject-script expectation (a script-name claim now also
watches pyproject.toml where the script is declared) and the trusted-base design
doc-guard (now IMPLEMENTED). 639 non-slow tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lity (WP8)

Offline, per-claim evidence: for each claim it derives deterministic mutations from the
checker grammar and records whether the verdict matched expectation —
- falsify (rename symbol / reassign const / change param): expect FAIL; a PASS is a MISS;
- benign (trailing comment): expect PASS; a FAIL is BRITTLE (false alarm);
- ceiling (content drift keeping an existence symbol): expect PASS, recorded as the
  documented trigger-vs-truth ceiling, never a penalty.
ERROR (e.g. an executable checker under --deny-exec) is its own bucket, never a miss.
Output is deterministic (no timestamps/randomness) and never mutates the real repo —
each mutation runs against a throwaway copy of only the file the checker reads.

Honest scope: structural/existence C3 forms are mutation-scored; string/regex/code, typed
C5, C1, C4 are reported with strength and `mutation: unsupported` (no fabricated mutation).
Registered as the `warrant-quality` bench subcommand. 7 tests; 645 non-slow pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…WP1)

- docs/BENCHMARK_CURRENT.md: version- and commit-stamped reruns of the reproducible
  suites on current code — large-mutation (240 pairs, P=R=0.93, 11.6x/10.4x FP reduction),
  binding-lifecycle (808 pairs, selection recall 0.54->1.00, alarm precision/recall 1.00,
  0 errored), realworld (5 cases 2/1/2), and the new warrant-quality harness. The reruns
  MATCH the historical runs (same content-derived run_id), proving the V1 changes are
  additive and do not regress the benchmarks. Includes a what-this-does-NOT-prove block.
- HISTORICAL banners on docs/BENCHMARK_v0.7.0.md (v0.7.0) and
  docs/BENCHMARK_BINDING_LIFECYCLE.md (0.9.0), each pointing to BENCHMARK_CURRENT.md;
  the historical numbers are preserved verbatim.
- docs/V1_SCOPE.md: what V1 strengthening means and does NOT mean (no universal semantic
  correctness; trusted-base is a trust root not a sandbox; config binding is TOML/JSON only;
  code:/structural are Python-only; extractor stays draft; carried-forward limitations).
- README: trust-state legend (WARRANTED born -> TRUSTED/DEGRADED/REVOKED/UNKNOWN), historical
  labels on the benchmark citations, command-surface entries for the new C3 forms,
  checker-strength in bindings, config provenance in bind-suggest, checker-source base, and
  bench warrant-quality.
- tests/test_benchmark_evidence.py: wording guards (historical docs labeled; current doc
  version/commit-stamped with a non-overclaim block; README links current; V1_SCOPE boundary).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test_large_mutation::test_committed_doc_matches_render asserts docs/BENCHMARK_v0.7.0.md
== lm.render_markdown(summary), so the generated doc cannot carry a hand-added banner.
Drop the HISTORICAL banner from it (the title already version-stamps it "(v0.7.0)"); its
historical status is conveyed by README + BENCHMARK_CURRENT.md (which names it as the
historical source). The binding-lifecycle doc has no byte-match guard, so it keeps its
banner. Updated test_benchmark_evidence to match: binding-lifecycle by banner, v0.7.0 by
version-stamped title + the current doc's cross-reference.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…items

A 5-lens adversarial review (BLOCK verdict) reproduced 6 real defects; all fixed
red-green, plus two hygiene items that contradicted stated invariants:

1. Config-key over-binding broke "default unchanged unless opt-in": a claim backticking a
   common config word (e.g. `dependencies`) bound pyproject.toml and could newly refuse a
   clean `verify` with exit 6. Fix: _CONFIG_KEY_STOPWORDS (PEP 621 / common keys) on the
   config axis; specific keys (max_workers) still bind. Regression test reproduces the exit-6.
2/3. SECURITY.md + action/README.md still said trusted-base was "not yet implemented" —
   false on this branch. Updated both to describe checker_trust: base as shipped (with the
   non-sandbox residual); added a guard test so the drift can't recur.
4. `code:` false PASS on an f-string docstring — code_only_python now recognises
   ast.JoinedStr docstrings.
5. `code:` false FAIL on a real string co-located on a docstring's line — docstrings are
   now blanked by AST node SPAN, not whole line.
6. `py-const` PASSed on value-TYPE drift (30/30.0, 1/True, 0/False) via Python == — now
   requires matching type before ==. Documented + red-green tested.

Hygiene: warrant-quality _run_mutated refuses a `../`-escaping file operand (its docstring
promised containment); check_signature wraps comparison in _PARSE_ERRORS so a pathological
signature ERRORs within pyast. Added an end-to-end ERROR-never-BROKEN test for the new C3
forms (non-literal RHS -> ERRORED, exit 5, never BROKEN).

658 non-slow tests pass; lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All V1 strengthening work packages (WP1-WP9) are implemented, tested, and documented;
the 5-lens adversarial review's BLOCK findings are all resolved with regression tests;
733 tests pass (incl. slow); lint clean. Bump the three version surfaces
(pyproject / __init__ / uv.lock) to the V1 release candidate. No tag, push, or publish.

rc1 (not final 1.0.0) is honest: the candidate invites real-repo benchmark validation and
the explicitly-deferred post-V1 items (declarative-structural checkers, route/SQL binding
indices, YAML config binding, audit-event atomicity) documented in docs/V1_SCOPE.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…st-fix)

Re-ran large-mutation / binding-lifecycle / realworld at commit b7376e7 (1.0.0rc1),
after the adversarial-review fixes: figures identical (large-mutation P=R=0.93,
11.6x/10.4x; binding-lifecycle 808 pairs 0.54->1.00 selection, 1.00 alarm; realworld
2/1/2), confirming the fixes don't touch the benchmarked paths. Version/commit stamps
updated; the version-stamp evidence test now reads the live pyproject version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final evidence-backed report: version gate, per-WP status, commands+results, verification
evidence, trigger-vs-truth preservation, security posture, benchmark posture, remaining
risks/non-goals, and the release decision (1.0.0rc1 candidate; no tag/push/publish).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Independent release audit (FIXED_NEEDED) findings, all repaired:

Blockers:
- docs/START_HERE.md still called trusted-base "(not yet implemented)" — a user-facing
  CI entry-point doc the prior fix missed; now describes it as implemented (V1).
- docs/BENCHMARK_BINDING_LIFECYCLE.md banner said the current rerun was "0.11.0" while the
  branch is 1.0.0rc1 (and BENCHMARK_CURRENT says 1.0.0rc1) — corrected the version.
- internal program docs (V1_IMPLEMENTATION_TRACKER.md, V1_ALIGNMENT_REPORT.md) were tracked;
  gitignored + git rm --cached (kept on disk as provenance). Also gitignore the research
  report, audit gate, release notes, and tool dirs (.claude/, .gitnexus/). docs/V1_SCOPE.md
  stays tracked (it is a public doc).

Should-fixes:
- docs/ROADMAP_BACKLOG.md trusted-base item flipped DEFER/HUMAN-REVIEW -> SHIPPED (V1).
- c3_ref.py module docstring now documents the code: form (was omitted) and the py-const
  value-AND-type rule.
- action.yml / action/README.md drop the stale 'dorian-vwp==0.6.*' pin example (no PyPI
  release yet) for the git source spec.
- docs/BENCHMARK_CURRENT.md labels the metric commit vs the (docs-only) release commit.

Hardened tests: test_no_live_doc_calls_trusted_base_unimplemented scans ALL live docs (not
just SECURITY.md/action README); warrant-quality path-escape test pins the containment guard;
benchmark-evidence commit-stamp check is version-agnostic. 660 non-slow tests pass; lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… commit

The benchmarks were re-run during the release audit at 33e9eaf and are identical
(large-mutation P=R=0.93, 11.6x/10.4x; binding-lifecycle 808 pairs 0.54->1.00,
precision/recall 1.00; realworld 2/1/2). Stamp the metric commit as 33e9eaf; the
tagged release commit is only this docs re-stamp (git diff 33e9eaf..HEAD -- src bench
is empty). Fixes the earlier note which referenced b7376e7 and predated the c3_ref
docstring edit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- intro blockquote: note checker_trust: base as the public/fork trust root (still not a
  sandbox), instead of flatly "not public CI for forked PRs" now that trusted-base shipped.
- roadmap: "tagged release" is done (v1.0.0rc1 prerelease); only PyPI trusted publishing remains.

Post-tag branch update (the v1.0.0rc1 tag stays frozen at 24ae7c8); folds into the next
tag / the PR to main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR releases dorian v1.0.0rc1, adding five V1 features: Python-AST structural C3 checker forms (py-signature:, py-const:, code:); trusted-base checker-source mode for fork PR security; multi-index TOML/JSON config-key binding; advisory checker-strength/risk diagnostics; and an offline warrant-quality mutation harness. The version is bumped from 0.11.0 to 1.0.0rc1 with corresponding documentation and security boundary updates.

Changes

dorian V1 Feature Set

Layer / File(s) Summary
Python AST structural C3 checkers
src/dorian/pyast.py, src/dorian/checkers/c3_ref.py, src/dorian/seal.py, src/dorian/bindings.py, spec/checkers.md, docs/AGENT_CLAIMS.md, tests/test_pystructural.py, tests/test_semantic_context.py
Adds pyast.py implementing code_only_python, check_signature, and check_const (AST-based, no execution); extends c3_ref.check() with py-signature:, py-const:, and code: (comments/docstrings stripped) branches; centralizes file-operand prefix detection in seal.py via _C3_FILE_OPERAND_FORMS; propagates constant to bindings.py; spec and full test coverage added.
Trusted-base checker-source mode
src/dorian/revalidate.py, src/dorian/cli.py, src/dorian/commands.py, action/action.yml, action/README.md, SECURITY.md, docs/SECURITY_BOUNDARY.md, docs/TRUSTED_BASE_ACTION_DESIGN.md, tests/test_trusted_base.py, tests/test_action_security_defaults.py, tests/test_claude_code_docs.py, tests/test_render_md.py
revalidate() gains checker_source parameter; _load_base_warrant() loads and integrity-verifies base sidecars with fail-closed caching; PR-added/modified executable checkers are skipped under "base" mode; RevalResult gains a notes field surfaced in text/markdown renders; --checker-source CLI option and DORIAN_CHECKER_SOURCE env added; checker_trust input wired into Action via DORIAN_CHECKER_SOURCE; security docs updated from "not implemented" to shipped V1; full end-to-end and CLI tests added.
Multi-index config-key binding
src/dorian/symbol_index.py, src/dorian/commands.py, tests/test_config_binding.py, tests/test_symbol_index.py
Adds config_key_index, claim_config_watch_paths, claim_watch_paths, and ambiguous_config_mentions to symbol_index.py; updates cmd_verify, cmd_bind_suggest, and cmd_rebind to use unified multi-index watch paths; stopword set prevents over-binding on common PEP 621 keys; comprehensive test suite including end-to-end revocation on config drift.
Checker strength and claim risk diagnostics
src/dorian/strength.py, src/dorian/commands.py, tests/test_strength.py
New strength.py computes truth-strength for C1/C3/C4/C5 checkers, detects kind-vs-strength adequacy mismatches, statically lints C4 test nodes for vacuous assertions, and rolls binding flags into per-claim high/medium/low risk; integrated into _emit_binding_gate_warnings and cmd_bindings output; full test suite with CLI integration test.
Warrant-quality offline mutation harness
bench/warrant_quality.py, src/dorian/commands.py, tests/test_warrant_quality.py
Adds bench/warrant_quality.py implementing deterministic C3 mutation generation (symbol:, py-const:, py-signature:), sandboxed mutation running (path-escape blocked), verdict-to-outcome mapping (caught/missed/brittle/ok/ceiling), claim quality scoring (weak/brittle/strong/unscored), and JSON/text output; registered as dorian bench warrant-quality; tests include sandbox-containment and CLI smoke test.
Version bump, benchmark docs, and V1 scope
pyproject.toml, src/dorian/__init__.py, docs/V1_SCOPE.md, docs/BENCHMARK_CURRENT.md, docs/BENCHMARK_BINDING_LIFECYCLE.md, docs/ROADMAP_BACKLOG.md, docs/START_HERE.md, README.md, .gitignore, tests/test_benchmark_evidence.py
Version bumped to 1.0.0rc1; new V1_SCOPE.md enumerates V1 capabilities and non-goals; BENCHMARK_CURRENT.md documents current benchmark environment and results; historical benchmark docs labeled; README.md updated with trust state semantics, command surface changes, and benchmark historical references; benchmark hygiene enforced via test_benchmark_evidence.py.

Sequence Diagram(s)

sequenceDiagram
  participant PR as PR CI (Action)
  participant cmd as commands.cmd_revalidate
  participant reval as revalidate()
  participant base as _load_base_warrant()
  participant git as git base ref
  participant checker as _check_claim()

  PR->>cmd: checker_trust=base, since=base_sha
  cmd->>reval: checker_source="base", since=base_sha
  loop per artifact claim
    reval->>base: artifact_uri, base_sha
    base->>git: read .warrant sidecar at base_sha
    git-->>base: raw bytes or FileNotFoundError
    base-->>reval: Warrant | None
    alt base warrant missing or tampered
      reval-->>reval: claim → ERRORED, no execution
    else PR added/modified executable checker
      reval-->>reval: record trust-root note, use base spec
      reval->>checker: base-approved CheckerSpec
      checker-->>reval: PASS | FAIL | ERROR
    else spec unchanged
      reval->>checker: base spec
      checker-->>reval: PASS | FAIL | ERROR
    end
  end
  reval-->>cmd: RevalResult(notes=[...])
  cmd-->>PR: exit code + sticky PR comment
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • ajaysurya1221/dorian#4: Directly modifies src/dorian/bindings.py's _checker_named_files C3 branch, the same function updated here to use _C3_FILE_OPERAND_FORMS.
  • ajaysurya1221/dorian#3: Introduces referenced_paths in src/dorian/seal.py, the same function updated here to use _C3_FILE_OPERAND_FORMS for expanded C3 operand handling.

Poem

🐇 A rabbit hops through AST trees,
Checking signatures with greatest of ease!
Base warrants guarded, no forks may deceive,
Config keys indexed, so watchers don't grieve.
Mutations measured, the warrant stands strong—
V1 has shipped, and the wait wasn't long! 🎉

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dorian-v1-strengthening

@ajaysurya1221 ajaysurya1221 merged commit c886610 into main Jun 15, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants