From 35487542abb907a081e423f9a9f885c1c8a86089 Mon Sep 17 00:00:00 2001 From: Ajay Surya Date: Wed, 17 Jun 2026 07:55:24 +0530 Subject: [PATCH 01/18] docs(v1): correct live PyPI/rc2 doc drift + extend doc-drift guard (TASK-001) dorian-vwp 1.0.0 is live on PyPI, but several user-facing docs still denied the shipped release (pre-PyPI 'install from source until...' framing, an rc2 latest stamp, a PARTIAL pypi-trusted-publishing backlog item). For a tool whose brand is verification, docs that refute their own release are self-refuting. This corrects every live version/PyPI surface and adds a guard test that scans them so the drift cannot regress. - README.md: lead with 'pip install dorian-vwp'; source install demoted to a secondary 'unreleased changes' option; roadmap line now says v1.0.0 is live. - action/action.yml: stale install-input comment now states PyPI is the default. - action/README.md: pre-PyPI wording replaced with live-PyPI reality; usage example no longer overrides install: to a git spec (uses the dorian-vwp default). - docs/ROADMAP_BACKLOG.md: pypi-trusted-publishing marked DONE. - docs/BENCHMARK_CURRENT.md: version stamp rc2 -> 1.0.0; honesty disclaimer ('not a fresh benchmark claim') preserved, no fresh run fabricated. - tests/test_version_sync.py: new test_no_stale_prepypi_or_rc_vocabulary_in_live_docs scoped to the five live surfaces (archive/CHANGELOG provenance excluded). Co-Authored-By: Claude Opus 4.8 (1M context) --- README.md | 23 +++++++++++----------- action/README.md | 9 +++++---- action/action.yml | 7 ++++--- docs/BENCHMARK_CURRENT.md | 6 +++--- docs/ROADMAP_BACKLOG.md | 9 ++++----- tests/test_version_sync.py | 39 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 66 insertions(+), 27 deletions(-) diff --git a/README.md b/README.md index 42e4b97..3e518d3 100644 --- a/README.md +++ b/README.md @@ -286,8 +286,15 @@ rebuildable at any time with `dorian sync` — and is never committed. ## Getting started -The distribution is `dorian-vwp`; the import and CLI are `dorian`. The first PyPI release is on the -roadmap — until it lands, install from source: +The distribution is `dorian-vwp`; the import and CLI are `dorian`. Install from PyPI: + +```bash +pip install dorian-vwp # core, zero runtime dependencies +pip install 'dorian-vwp[data]' # + duckdb for parquet data claims +pip install 'dorian-vwp[extract]' # + anthropic for LLM claim drafting (frozen/experimental) +``` + +To install the latest unreleased changes, install from source instead: ```bash pip install 'dorian-vwp @ git+https://github.com/ajaysurya1221/dorian.git' @@ -297,14 +304,6 @@ pip install 'dorian-vwp[data] @ git+https://github.com/ajaysurya1221/dorian.git' pip install 'dorian-vwp[extract] @ git+https://github.com/ajaysurya1221/dorian.git' # + anthropic for LLM claim drafting (frozen/experimental) ``` -After the first PyPI release: - -```bash -pip install dorian-vwp # core, zero runtime dependencies -pip install 'dorian-vwp[data]' # + duckdb for parquet data claims -pip install 'dorian-vwp[extract]' # + anthropic for LLM claim drafting (frozen/experimental) -``` - Then run `dorian verify --claims claims.json` on one change. For CI, add the composite [GitHub Action](action/README.md) — it revalidates the claims a pull request touches and posts a sticky PR comment. **Read its @@ -479,8 +478,8 @@ work perishable, so you find out when it expired. ([`docs/BENCHMARK_PUBLIC_REAL_REPOS.md`](docs/BENCHMARK_PUBLIC_REAL_REPOS.md)). These are **reproducible on those frozen SHAs only** — not a real-world performance claim; the trigger and truth layers are reported separately. -- **PyPI trusted publishing** — tagged releases now ship (latest: **`v1.0.0rc2`**, a V1 release - candidate / prerelease); publishing `dorian-vwp` to PyPI via a Trusted Publisher is next. +- **PyPI trusted publishing** — `dorian-vwp` is published to PyPI via a Trusted Publisher + (latest: **`v1.0.0`**); `pip install dorian-vwp` installs the released package. Non-goals stay non-goals: no servers, no dashboards, no hosted control plane, no model at check time. Local-first is the design center. diff --git a/action/README.md b/action/README.md index 51b2254..d8bf996 100644 --- a/action/README.md +++ b/action/README.md @@ -28,8 +28,8 @@ jobs: - uses: ajaysurya1221/dorian/action@main with: fail_on: revoked - # until the first PyPI release, install from source: - install: 'dorian-vwp @ git+https://github.com/ajaysurya1221/dorian.git' + # install defaults to the published PyPI package (dorian-vwp); + # override only to pin a version or install unreleased changes. ``` `fetch-depth: 0` is required because `dorian revalidate --since` runs @@ -126,12 +126,13 @@ Hard rules either way: | --------------- | -------------------------------------------- | ------------------------------------------------------------------------ | | `fail_on` | `revoked` | when to fail the step: `revoked` (exit 4 only), `degraded` (3 or 4), `never` | | `base` | `${{ github.event.pull_request.base.sha }}` | git ref passed to `dorian revalidate --since` | -| `install` | `dorian-vwp` | pip spec; until the first PyPI release use the git source spec (below), or `.` for checkout installs | +| `install` | `dorian-vwp` | pip spec; defaults to the published PyPI package. Use the git source spec (below) for unreleased changes, or `.` for checkout installs | | `deny_exec` | `false` | refuse to run executable checkers (C4 pytest, C5 shell): they ERROR. For untrusted/fork PRs; fail-closed, not a sandbox | | `deny_shell` | `false` | narrower than `deny_exec`: block only C5 shell, still allow C4 pytest | | `checker_trust` | `head` | `head` runs the checked-out checker spec (trusted repos); `base` runs the base-ref spec so PR-authored executable checkers never run (public/fork PRs) | -Until the first PyPI release of `dorian-vwp`, set `install` to a source spec: +To install unreleased changes instead of the published `dorian-vwp` package, set +`install` to a source spec: `install: 'dorian-vwp @ git+https://github.com/ajaysurya1221/dorian.git'`. ## Behavior diff --git a/action/action.yml b/action/action.yml index 67d5697..0f9b11e 100644 --- a/action/action.yml +++ b/action/action.yml @@ -23,9 +23,10 @@ inputs: default: ${{ github.event.pull_request.base.sha }} install: description: >- - pip requirement spec for dorian. Until the first PyPI release, use a git - source spec ('dorian-vwp @ git+https://github.com/ajaysurya1221/dorian.git'), - pass '.' to install the checked-out source, or pin a tag once published. + pip requirement spec for dorian. Defaults to the published PyPI package + ('dorian-vwp'); pin a version to lock it. Use a git source spec + ('dorian-vwp @ git+https://github.com/ajaysurya1221/dorian.git') for + unreleased changes, or pass '.' to install the checked-out source. required: false default: dorian-vwp deny_exec: diff --git a/docs/BENCHMARK_CURRENT.md b/docs/BENCHMARK_CURRENT.md index 373d283..636a7c2 100644 --- a/docs/BENCHMARK_CURRENT.md +++ b/docs/BENCHMARK_CURRENT.md @@ -10,9 +10,9 @@ and are kept as-is for provenance. | field | value | | --- | --- | -| dorian version | `1.0.0rc2` (V1 release candidate) | +| dorian version | `1.0.0` | | metric commit | `33e9eaf` (the benchmark figures were measured here, during the release audit) | -| release commit | rc2 changes after the metric commit are release-state tooling, docs, and version metadata; the figures below remain the stamped metric-run evidence, not a fresh rc2 benchmark claim | +| release commit | changes after the metric commit are release-state tooling, docs, and version metadata; the figures below remain the stamped metric-run evidence, not a fresh `1.0.0` benchmark claim | | Python | 3.12.4 | | platform | darwin (CI matrix: 3.11 / 3.12 / 3.13) | | reproduce | `dorian bench large-mutation` · `dorian bench binding-lifecycle` · `dorian bench realworld-usecases` | @@ -21,7 +21,7 @@ These numbers were re-run at the `1.0.0rc1` commit *after* the adversarial-revie landed AND again during the independent release audit, confirming those fixes (py-const type check, `code:` docstring handling, config-key stopwords) did not move the benchmark figures — expected, since the suites exercise C1/C3 (symbol/regex/string/path)/C5, not the new -structural/config-binding paths. The `1.0.0rc2` stamp keeps this current-version doc aligned +structural/config-binding paths. The `1.0.0` stamp keeps this current-version doc aligned with the source package version without upgrading the benchmark claim beyond the recorded metric-run evidence. diff --git a/docs/ROADMAP_BACKLOG.md b/docs/ROADMAP_BACKLOG.md index f0b916f..938ea5f 100644 --- a/docs/ROADMAP_BACKLOG.md +++ b/docs/ROADMAP_BACKLOG.md @@ -79,11 +79,10 @@ before marketing, deterministic verification before AI automation.* - id: pypi-trusted-publishing title: PyPI Trusted Publishing workflow (manual, OIDC, no token) - status: PARTIAL - problem: Source install works; PyPI install reduces friction and signals maturity. - evidence: .github/workflows/publish.yml (workflow_dispatch only; environment-gated; OIDC). - remaining: A maintainer must create the PyPI Trusted Publisher + `pypi` GitHub environment, then trigger manually. Nothing publishes automatically. - human_review_required: yes # credentials / PyPI project ownership + status: DONE + evidence: .github/workflows/publish.yml (workflow_dispatch only; environment-gated; OIDC); `dorian-vwp` 1.0.0 is live on PyPI (`pip install dorian-vwp`). + acceptance_criteria: PyPI Trusted Publisher + `pypi` GitHub environment configured; a tagged release published `dorian-vwp` to PyPI via OIDC (no token). + human_review_required: no confidence: high - id: public-microbenchmark-execution diff --git a/tests/test_version_sync.py b/tests/test_version_sync.py index 703467b..8a45639 100644 --- a/tests/test_version_sync.py +++ b/tests/test_version_sync.py @@ -45,3 +45,42 @@ def test_readme_release_badge_is_dynamic_not_hardcoded() -> None: assert "img.shields.io/github/v/release/" in readme # and no stale hardcoded release badge like .../badge/release-v0.9-... slipped in assert not re.search(r"badge/release-v?\d+\.\d+", readme), "hardcoded version badge found" + + +# Live doc surfaces that must reflect the shipped PyPI release. dorian-vwp 1.0.0 +# went live on PyPI 2026-06-16; docs that still say the release hasn't happened +# (pre-PyPI "install from source until..." framing, or an rc2 latest stamp) are +# self-refuting for a verification tool. Historical references in CHANGELOG and +# archive/ are legitimate provenance and are deliberately NOT scanned here. +_LIVE_PYPI_DOC_SURFACES = ( + "README.md", + "action/action.yml", + "action/README.md", + "docs/BENCHMARK_CURRENT.md", + "docs/ROADMAP_BACKLOG.md", +) + +# Phrases that deny the shipped release. The "first PyPI release" family is +# matched case-insensitively so capitalized ("Until the first PyPI release") and +# lowercase ("until the first PyPI release") variants are both caught; the rc2 +# stamp is matched literally. +_STALE_PREPYPI_PHRASES = ( + "first PyPI release is on the roadmap", + "after the first PyPI release", + "until the first PyPI release", +) +_STALE_RC_LITERAL = "v1.0.0rc2" + + +def test_no_stale_prepypi_or_rc_vocabulary_in_live_docs() -> None: + offenders: list[str] = [] + for rel in _LIVE_PYPI_DOC_SURFACES: + text = (REPO_ROOT / rel).read_text(encoding="utf-8") + for lineno, line in enumerate(text.splitlines(), start=1): + lowered = line.lower() + for phrase in _STALE_PREPYPI_PHRASES: + if phrase.lower() in lowered: + offenders.append(f"{rel}:{lineno}: {phrase!r} -> {line.strip()}") + if _STALE_RC_LITERAL in line: + offenders.append(f"{rel}:{lineno}: {_STALE_RC_LITERAL!r} -> {line.strip()}") + assert not offenders, "stale pre-PyPI / rc2 vocabulary in live docs:\n" + "\n".join(offenders) From 498cb10ec45378cd88019935e87d3495a0ca7d66 Mon Sep 17 00:00:00 2001 From: Ajay Surya Date: Wed, 17 Jun 2026 08:57:46 +0530 Subject: [PATCH 02/18] docs(v1): promote runnable hero demo + repoint Demo badge (TASK-002) Move the copy-paste "Try it in 30 seconds" block above the fold as the hero, point the Demo badge anchor at it, and label the illustrative /login "60-second aha" clearly. Add a black-box test asserting the Demo badge resolves to the runnable heading (not the illustrative one) so the first-click demo path can't silently rot. Co-Authored-By: Claude Opus 4.8 (1M context) --- README.md | 70 +++++++++++++++++++----------------- tests/test_readme_example.py | 46 ++++++++++++++++++++++++ 2 files changed, 84 insertions(+), 32 deletions(-) diff --git a/README.md b/README.md index 3e518d3..5dc4610 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@

Quickstart - Demo + Demo GitHub Action

@@ -42,6 +42,7 @@ now and is re-checked on every future change, so a confident summary doesn't qui ## Table of contents +- [Try it in 30 seconds](#try-it-in-30-seconds) - [The 60-second aha](#the-60-second-aha) - [We ran this on dorian itself](#we-ran-this-on-dorian-itself) - [About](#about) @@ -60,10 +61,42 @@ now and is re-checked on every future change, so a confident summary doesn't qui - [License](#license) - [Contact](#contact) +## Try it in 30 seconds + +A self-contained run on a throwaway repo — copy-paste it; it leaves nothing behind but a +temp directory. (This exact sequence is pinned by a black-box test, so it is executable and +kept working, not just illustrative.) + +```bash +tmp=$(mktemp -d) && cd "$tmp" && git init -q +printf 'def handler():\n return 200\n' > app.py +printf '# change note\n\n`handler()` lives in app.py.\n' > note.md +git add -A && git commit -q -m "app + note" + +cat > claims.json <<'JSON' +{"claims": [ + {"id": "handler-exists", "text": "handler() lives in app.py.", + "kind": "behavior", "load_bearing": true, + "checkers": [{"type": "C3", "program": "symbol:app.py::handler"}]} +]} +JSON + +dorian verify note.md --claims claims.json # -> verified 1/1 claim(s) (exit 0) + +# now a refactor renames the function the note claims exists: +printf 'def renamed():\n return 200\n' > app.py +dorian revalidate --since HEAD # -> handler-exists BROKEN; WARRANTED -> REVOKED (exit 4) +``` + +`note.md` never changed and `git`/CI stay quiet — but the warrant flips to REVOKED, naming +the exact claim that stopped being true. (Don't have `dorian` yet? See +[Getting started](#getting-started).) + ## The 60-second aha -An agent finishes a change and emits the claims it just made — a `claims.json` next to the work, -each claim bound to a read-only deterministic checker: +*(Illustrative — these files are not in your checkout; run the copy-paste demo above to try it +yourself.)* An agent finishes a change and emits the claims it just made — a `claims.json` next to +the work, each claim bound to a read-only deterministic checker: ```json { @@ -333,35 +366,8 @@ jobs: install: 'dorian-vwp @ git+https://github.com/ajaysurya1221/dorian.git' ``` -### Try it in 30 seconds - -A self-contained run on a throwaway repo — copy-paste it; it leaves nothing behind but a -temp directory. (This exact sequence is pinned by a black-box test, so it is executable and -kept working, not just illustrative.) - -```bash -tmp=$(mktemp -d) && cd "$tmp" && git init -q -printf 'def handler():\n return 200\n' > app.py -printf '# change note\n\n`handler()` lives in app.py.\n' > note.md -git add -A && git commit -q -m "app + note" - -cat > claims.json <<'JSON' -{"claims": [ - {"id": "handler-exists", "text": "handler() lives in app.py.", - "kind": "behavior", "load_bearing": true, - "checkers": [{"type": "C3", "program": "symbol:app.py::handler"}]} -]} -JSON - -dorian verify note.md --claims claims.json # -> verified 1/1 claim(s) (exit 0) - -# now a refactor renames the function the note claims exists: -printf 'def renamed():\n return 200\n' > app.py -dorian revalidate --since HEAD # -> handler-exists BROKEN; WARRANTED -> REVOKED (exit 4) -``` - -`note.md` never changed and `git`/CI stay quiet — but the warrant flips to REVOKED, naming -the exact claim that stopped being true. +Now that `dorian` is installed, the copy-paste runnable demo at the top — +[Try it in 30 seconds](#try-it-in-30-seconds) — runs end to end against a throwaway repo. ## Writing claims an agent can be held to diff --git a/tests/test_readme_example.py b/tests/test_readme_example.py index 93866c4..efaccbb 100644 --- a/tests/test_readme_example.py +++ b/tests/test_readme_example.py @@ -10,6 +10,7 @@ from __future__ import annotations import os +import re import subprocess import sys from pathlib import Path @@ -77,3 +78,48 @@ def test_readme_still_contains_the_runnable_commands() -> None: assert "dorian verify note.md --claims claims.json" in readme assert "dorian revalidate --since HEAD" in readme assert "symbol:app.py::handler" in readme + + +def _github_slug(heading_text: str) -> str: + """GitHub-style anchor slug for a markdown heading (lowercase, drop punctuation, spaces->-).""" + slug = heading_text.strip().lower() + slug = re.sub(r"[^\w\s-]", "", slug) # drop punctuation, keep word chars / space / hyphen + slug = re.sub(r"\s+", "-", slug) + return slug + + +def test_readme_demo_badge_points_at_the_runnable_demo() -> None: + """The top "Demo" badge must anchor to a REAL, runnable heading — not the illustrative one. + + A new reader who clicks "Demo" lands on the first hands-on section; if that section is the + copy-paste-fails "60-second aha" the demo path is broken. We resolve the badge's `#anchor` + to an actual heading via GitHub-style slugs and assert it is the runnable "Try it" section. + """ + readme = (REPO_ROOT / "README.md").read_text(encoding="utf-8") + + # the badge link wraps the "Demo" shields.io image: ...alt="Demo"... + m = re.search(r'\s*]*alt="Demo"', readme) + assert m is not None, "could not find the Demo badge link in README.md" + badge_anchor = m.group(1) + + # GitHub-style slug of every ## / ### heading + heading_slugs = { + _github_slug(text): text + for text in re.findall(r"^#{2,3}\s+(.+?)\s*$", readme, flags=re.MULTILINE) + } + + # the anchor must resolve to a heading that actually exists + assert badge_anchor in heading_slugs, ( + f"Demo badge anchor #{badge_anchor} does not match any README heading; " + f"headings are: {sorted(heading_slugs)}" + ) + + target_heading = heading_slugs[badge_anchor] + # ...and it must be the runnable demo, not the illustrative "60-second aha" + assert "60-second aha" not in target_heading.lower(), ( + f"Demo badge points at the illustrative section ({target_heading!r}); " + "it must point at the runnable copy-paste demo." + ) + assert "try it" in target_heading.lower(), ( + f"Demo badge should point at the runnable 'Try it' demo, got {target_heading!r}" + ) From 1fa09eed42c2bf11842f54dae22f139e576b8a7e Mon Sep 17 00:00:00 2001 From: Ajay Surya Date: Wed, 17 Jun 2026 08:59:38 +0530 Subject: [PATCH 03/18] fix(security): reject option-like pytest checker nodeids (TASK-005a) A C4 program `pytest:-pevil` / `pytest:--collect-only` reached pytest as an OPTION (-p/-c/--collect-only), not a file, because the file part was only checked for emptiness and repo-containment. Reject any nodeid whose file part is empty or starts with '-' as ERROR(bad_program) before .resolve() and before any subprocess spawn. Red-green test asserts the three option-like nodeids ERROR with zero subprocess spawns; legitimate nodeids are unaffected (file parts never start with '-'). Co-Authored-By: Claude Opus 4.8 (1M context) --- src/dorian/checkers/c4_test.py | 7 ++++++- tests/test_c4.py | 23 +++++++++++++++++++++++ 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/src/dorian/checkers/c4_test.py b/src/dorian/checkers/c4_test.py index 82b5ba5..a52738a 100644 --- a/src/dorian/checkers/c4_test.py +++ b/src/dorian/checkers/c4_test.py @@ -44,7 +44,12 @@ def check(ctx: CheckContext, spec: CheckerSpec) -> CheckResult: file, sep, rest = nodeid.partition("::") file = ctx.rename_map.get(file, file) - if not file or not (ctx.repo / file).resolve().is_relative_to(ctx.repo.resolve()): + if not file or file.startswith("-"): + # an empty or leading-dash file part would reach pytest as an OPTION + # (-p / -c / --collect-only), not a file; reject before .resolve() and + # before any subprocess (the argv carries no `--` fence) + return CheckResult(Verdict.ERROR, detail="bad_program") + if not (ctx.repo / file).resolve().is_relative_to(ctx.repo.resolve()): # a hostile nodeid ('..' or absolute) must not probe files outside the repo return CheckResult(Verdict.ERROR, detail="bad_program") nodeid = file + sep + rest diff --git a/tests/test_c4.py b/tests/test_c4.py index 7bdd06f..7f1d685 100644 --- a/tests/test_c4.py +++ b/tests/test_c4.py @@ -222,6 +222,29 @@ def test_nodeid_escaping_repo_is_error_bad_program(c4_repo): assert res.detail == "bad_program", prog +def test_leading_dash_nodeid_is_error_no_spawn(c4_repo, monkeypatch): + """A nodeid whose file part is empty or starts with '-' would reach pytest as + an OPTION (-p / -c / --collect-only), not a file, and pytest would act on it. + It must be rejected as bad_program BEFORE any subprocess spawns (the argv has + no `--` fence, and `--` alone does not reliably fence option-looking values).""" + calls: list = [] + + class _ProbeSubprocess: + TimeoutExpired = subprocess.TimeoutExpired + + @staticmethod + def run(*args, **kwargs): + calls.append(args) + raise OSError("probe: pytest must not spawn for a hostile nodeid") + + monkeypatch.setattr(c4_mod, "subprocess", _ProbeSubprocess()) + for prog in ("pytest:-pevil", "pytest:--collect-only", "pytest:-c/x.ini::test_a"): + res = run_c4(c4_repo, prog) + assert res.verdict is Verdict.ERROR, prog + assert res.detail == "bad_program", prog + assert calls == [], "no pytest subprocess may spawn for a leading-dash nodeid" + + # --- revalidation ordering: C4 is the most expensive checker ---------------------- From 7f59a1f00a2a8bf4a2878b8ddb97858059765f3c Mon Sep 17 00:00:00 2001 From: Ajay Surya Date: Wed, 17 Jun 2026 09:08:24 +0530 Subject: [PATCH 04/18] docs: add reproducible real cross-PR catch log (TASK-003) One documented, independently-reproduced cross-PR catch on a public repo (encode/httpx, BSD-3): a load-bearing config-value claim sealed at commit A (requires-python ">=3.8") is flipped WARRANTED->REVOKED (exit 4) by a real later upstream PR (#3592 "Drop Python 3.8 support", >=3.8 -> >=3.9) while httpx's own test suite stays green (no test references requires-python; the PR diff touches no test file) and no stateless per-PR bot would re-open commit A's claim. Fills the previously-empty Entries section of the ledger template; honest-scope caveats and full reproduction included. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/REAL_CATCH_LOG.md | 140 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 137 insertions(+), 3 deletions(-) diff --git a/docs/REAL_CATCH_LOG.md b/docs/REAL_CATCH_LOG.md index 5c33b69..657600f 100644 --- a/docs/REAL_CATCH_LOG.md +++ b/docs/REAL_CATCH_LOG.md @@ -45,6 +45,140 @@ checker could not confirm — trigger-vs-truth gap), or **weak-binding-warning** ## Entries -_None yet. This file ships empty on purpose: dorian has not yet accumulated real -external catches, and inventing them would violate [VALIDATION_HONESTY.md](VALIDATION_HONESTY.md). -The first honest entry here is worth more than any number in the benchmark docs._ +### 2026-06-17 — httpx `requires-python` floor `>=3.8` → `>=3.9` (real upstream PR #3592) + +- **Claim:** "httpx declares a minimum supported Python of 3.8 (pyproject project.requires-python is `">=3.8"`)." +- **Checker:** `C3 config-value:pyproject.toml:project.requires-python:">=3.8"` +- **Repo / project:** [`encode/httpx`](https://github.com/encode/httpx) (public-safe? **yes** — BSD-3-Clause, frozen public SHAs) +- **Source commit that sealed it:** `336204f0121a9aefdebac5cacd81f912bafe8057` (commit A) +- **Change that triggered revalidation:** `4fb9528c2f5ac000441c3634d297e77da23067cd` — real upstream **"Drop Python 3.8 support (#3592)"** by Alex Grönholm +- **Outcome:** **true-catch** +- **Verdict dorian gave:** **BROKEN** (`WARRANTED → REVOKED`, exit 4) +- **Would you have shipped the break otherwise?** **yes** — `requires-python` is packaging metadata covered by no test (`grep -rn requires-python tests/` is empty; B's diff touches 8 files, none under `tests/`), so httpx's CI is green at B; and a per-PR review bot is stateless — it has no memory of commit A's claim and PR #3592's own diff is self-consistent, so nothing re-opens the old note. +- **User time spent (setup + review):** ~10 minutes +- **Reviewer notes:** dogfood on a real public repo at frozen SHAs; **independently reproduced** end-to-end. The other three sealed claims (`Client` defined in `_client.py`, `Client` exported, version `0.28.1`) stayed VERIFIED — dorian narrowed revalidation to the **1 candidate** whose source (`pyproject.toml`) actually changed. Full captured output and reproduction below. + +#### Captured output + +**1. Seal at A — `dorian verify` (exit 0):** + +``` +$ dorian verify note.md --claims claims.json +sha256:7db02138b329729b4f84b20d37a1154e237c07993783750a3c26e3531334b8a2 +verified 4/4 claim(s) against current sources -> note.md.warrant +# exit 0 +``` + +The warrant id is `sha256(canonical_json(body))` — content-addressed, so it reproduces for +this exact `note.md` + `claims.json` against httpx at SHA A. + +**2. The real upstream drift — `git show 4fb9528 --stat`:** + +``` + Drop Python 3.8 support (#3592) + .github/workflows/publish.yml | 2 +- + .github/workflows/test-suite.yml | 2 +- + CHANGELOG.md | 6 ++++++ + README.md | 2 +- + docs/async.md | 2 +- + docs/index.md | 2 +- + pyproject.toml | 3 +-- + requirements.txt | 3 +-- + 8 files changed, 13 insertions(+), 9 deletions(-) + +# the pyproject.toml hunk: -requires-python = ">=3.8" +requires-python = ">=3.9" +``` + +**3. The drift is silent to the test suite** — PR #3592 touches no test file, and no test +references the key: + +``` +$ grep -rn "requires-python" tests/ +# (no matches) +``` + +**4. Re-check at B — `dorian revalidate --since A` (exit 4):** + +``` +$ dorian revalidate --since 336204f0121a9aefdebac5cacd81f912bafe8057 +checked 1 candidate claim(s) +BROKEN sha256:7db02138b329729b httpx-python-floor-38 C3: config_value_mismatch: project.requires-python +fold sha256:7db02138b329729b WARRANTED -> REVOKED +# exit 4 +``` + +**5. Resulting state — `dorian status` (exit 4):** + +``` +$ dorian status note.md +REVOKED note.md sha256:7db02138b329729b BROKEN=1 VERIFIED=3 +``` + +#### The change-note and claims (verbatim, so the run is reproducible) + +`note.md`: + +```markdown +# Change note: pin our integration to httpx's supported Python floor + +We depend on `httpx` and need our CI matrix to track the library's own support +window. As of this change, the facts our integration relies on are: + +- httpx's packaging declares a minimum Python of **3.8** (`project.requires-python` + is `">=3.8"` in `pyproject.toml`), so our service may still run on Python 3.8. +- The public `Client` class is defined in `httpx/_client.py`. +- `Client` is listed in the top-level `httpx` package exports (`httpx/__init__.py`). +- The pinned library version is `0.28.1` (`httpx/__version__.py`). + +If httpx raises its supported Python floor, our 3.8 CI lane must be dropped in the +same change — that is the load-bearing fact below. +``` + +`claims.json`: + +```json +{ + "claims": [ + {"id": "httpx-python-floor-38", "text": "httpx declares a minimum supported Python of 3.8 (pyproject project.requires-python is \">=3.8\").", + "kind": "quantity", "load_bearing": true, + "checkers": [{"type": "C3", "program": "config-value:pyproject.toml:project.requires-python:\">=3.8\""}]}, + {"id": "httpx-client-defined", "text": "The public Client class is defined in httpx/_client.py.", + "kind": "behavior", "load_bearing": true, + "checkers": [{"type": "C3", "program": "symbol:httpx/_client.py::Client"}]}, + {"id": "httpx-client-exported", "text": "Client is listed in the top-level httpx package exports.", + "kind": "behavior", "load_bearing": false, + "checkers": [{"type": "C3", "program": "string:httpx/__init__.py::\"Client\""}]}, + {"id": "httpx-version-0281", "text": "The pinned httpx version is 0.28.1.", + "kind": "quantity", "load_bearing": false, + "checkers": [{"type": "C3", "program": "py-const:httpx/__version__.py::__version__::\"0.28.1\""}]} + ] +} +``` + +#### Reproduce it yourself (public repo, frozen SHAs) + +```bash +pip install dorian-vwp +git clone https://github.com/encode/httpx && cd httpx +git checkout -b dorian-catch 336204f0121a9aefdebac5cacd81f912bafe8057 # A +# write note.md and claims.json exactly as above, then: +dorian verify note.md --claims claims.json # -> verified 4/4, exit 0 +git add note.md note.md.warrant claims.json && git commit -m "seal at A" +git cherry-pick 4fb9528c2f5ac000441c3634d297e77da23067cd # real upstream B: Drop Python 3.8 (#3592) +dorian revalidate --since 336204f0121a9aefdebac5cacd81f912bafe8057 +# -> httpx-python-floor-38 BROKEN; WARRANTED -> REVOKED; exit 4 +dorian status note.md # -> REVOKED BROKEN=1 VERIFIED=3 +``` + +#### Honest scope — what this does and does **not** show + +**Does show:** on a real public repo, a load-bearing claim sealed at A was flipped to REVOKED +by a real, unrelated later commit, deterministically and reproducibly, when no test, no CI +signal, and no stateless per-PR review would have re-opened it. + +**Does not show:** that dorian flags drift it was not bound to, that one example "proves" httpx +correct, or that this result extrapolates. It re-checks **only the properties you explicitly +bound** — here, one config value. A claim bound only to a structural existence checker (e.g. `symbol:`) would **not** +flip if a function were gutted while keeping its name (the trigger≠truth / gutted-body +ceiling — see [WRITING_GOOD_CLAIMS.md](WRITING_GOOD_CLAIMS.md)). This is **one documented +catch**, not a benchmark; inventing more would violate [VALIDATION_HONESTY.md](VALIDATION_HONESTY.md). From 30e68cf75a4964ba641f5cd7577004e99b4d0299 Mon Sep 17 00:00:00 2001 From: Ajay Surya Date: Wed, 17 Jun 2026 09:11:20 +0530 Subject: [PATCH 05/18] perf(v1): build symbol/config indexes once per verify (TASK-005b) cmd_verify rebuilt the whole-repo Python symbol-definer index 2x and the config-key index 3x per run, each a full git ls-files + AST/TOML/JSON walk. Both are pure functions of the repo tree, so build each ONCE at the top of cmd_verify and thread the precomputed copy through claim_watch_paths, ambiguous_symbol_mentions and ambiguous_config_mentions (extending the existing optional-`definers` pattern with a None sentinel for the non-git degrade path). Output is byte-identical: a call-count spy test pins each builder to <=1x per verify, and the existing watch/read-set/exit-code assertions (Tests A-D, test_verify) prove the verify result is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) --- src/dorian/commands.py | 16 ++++++++--- src/dorian/symbol_index.py | 57 +++++++++++++++++++++++++++----------- tests/test_symbol_index.py | 27 ++++++++++++++++++ 3 files changed, 80 insertions(+), 20 deletions(-) diff --git a/src/dorian/commands.py b/src/dorian/commands.py index d32b792..dfd42d3 100644 --- a/src/dorian/commands.py +++ b/src/dorian/commands.py @@ -247,17 +247,25 @@ def cmd_verify(args: argparse.Namespace) -> int: # claims mention (even when no checker named them): the symbol-definer watch # the seal adds is then also captured + hashed + scope-linted honestly paths = referenced_paths(claims) + # build each whole-repo index ONCE and thread it into every consumer below; each + # builder is a pure function of the repo tree, so a shared copy yields byte-identical + # watches/warnings to rebuilding per call (TASK-005b). `definers` is None only on a + # non-git repo, where each symbol helper self-degrades to {} exactly as before. + try: + definers = symbol_index.python_symbol_definers(repo) + except gitio.GitError: + definers = None + config_index, unparseable_config = symbol_index.config_key_index(repo) # multi-index binding: Python symbol-definers + pyproject scripts + config keys - symbol_watch = symbol_index.claim_watch_paths(repo, claims) + symbol_watch = symbol_index.claim_watch_paths(repo, claims, definers, config_index) for path in sorted({p for ps in symbol_watch.values() for p in ps}): if path not in paths: paths.append(path) readset = parse_manual(paths, repo) # a load-bearing claim naming an AMBIGUOUS symbol/config key (>1 definer) is left # unbound; do not let that skip be silent — warn so the author binds it explicitly - ambiguous = symbol_index.ambiguous_symbol_mentions(repo, claims) - ambiguous_config = symbol_index.ambiguous_config_mentions(repo, claims) - _, unparseable_config = symbol_index.config_key_index(repo) + ambiguous = symbol_index.ambiguous_symbol_mentions(repo, claims, definers) + ambiguous_config = symbol_index.ambiguous_config_mentions(repo, claims, config_index) except (ValueError, OSError, gitio.GitError) as exc: print(f"dorian verify: {exc}", file=sys.stderr) return EXIT_USAGE diff --git a/src/dorian/symbol_index.py b/src/dorian/symbol_index.py index ad2ba19..eef601d 100644 --- a/src/dorian/symbol_index.py +++ b/src/dorian/symbol_index.py @@ -181,7 +181,11 @@ def pyproject_script_definers( return out -def claim_symbol_watch_paths(repo: Path, claims: list[Claim]) -> dict[str, tuple[str, ...]]: +def claim_symbol_watch_paths( + repo: Path, + claims: list[Claim], + definers: dict[str, tuple[str, ...]] | None = None, +) -> dict[str, tuple[str, ...]]: """claim id -> the sorted, unique defining files to add to that claim's watch set: for every identifier-shaped token in the claim text that names a symbol defined in EXACTLY ONE file. Claims mentioning no such symbol are omitted @@ -197,10 +201,12 @@ def claim_symbol_watch_paths(repo: Path, claims: list[Claim]) -> dict[str, tuple } if not any(claim_tokens.values()): return {} - try: - index = python_symbol_definers(repo) - except gitio.GitError: - return {} + if definers is None: + try: + definers = python_symbol_definers(repo) + except gitio.GitError: + return {} + index = definers scripts = pyproject_script_definers(repo, index) # console-script name -> target file out: dict[str, tuple[str, ...]] = {} for claim in claims: @@ -266,7 +272,11 @@ def config_key_index(repo: Path) -> tuple[dict[str, tuple[str, ...]], tuple[str, return ({k: tuple(sorted(v)) for k, v in sorted(keys.items())}, tuple(sorted(unparseable))) -def claim_config_watch_paths(repo: Path, claims: list[Claim]) -> dict[str, tuple[str, ...]]: +def claim_config_watch_paths( + repo: Path, + claims: list[Claim], + config_index: dict[str, tuple[str, ...]] | None = None, +) -> dict[str, tuple[str, ...]]: """claim id -> the config file(s) to add to its watch set: for every identifier-shaped token in the claim text that is a config key defined in EXACTLY ONE tracked .toml/.json. Ambiguous keys (>1 file) are skipped (see ambiguous_config_mentions). Additive and @@ -274,7 +284,7 @@ def claim_config_watch_paths(repo: Path, claims: list[Claim]) -> dict[str, tuple claim_tokens = {c.id: _tokens(c.text) for c in claims if isinstance(c.text, str)} if not any(claim_tokens.values()): return {} - index, _ = config_key_index(repo) + index = config_index if config_index is not None else config_key_index(repo)[0] out: dict[str, tuple[str, ...]] = {} for claim in claims: paths: set[str] = set() @@ -289,25 +299,35 @@ def claim_config_watch_paths(repo: Path, claims: list[Claim]) -> dict[str, tuple return out -def claim_watch_paths(repo: Path, claims: list[Claim]) -> dict[str, tuple[str, ...]]: +def claim_watch_paths( + repo: Path, + claims: list[Claim], + definers: dict[str, tuple[str, ...]] | None = None, + config_index: dict[str, tuple[str, ...]] | None = None, +) -> dict[str, tuple[str, ...]]: """All deterministic re-check watches dorian binds per claim: Python symbol-definer files + pyproject console scripts (claim_symbol_watch_paths) UNION config-key files (claim_config_watch_paths). Union, sorted, deduped. Conservative and additive — it only ever widens the re-check trigger set; it never proves a claim true.""" merged: dict[str, set[str]] = {} - for source in (claim_symbol_watch_paths(repo, claims), claim_config_watch_paths(repo, claims)): + for source in ( + claim_symbol_watch_paths(repo, claims, definers), + claim_config_watch_paths(repo, claims, config_index), + ): for cid, paths in source.items(): merged.setdefault(cid, set()).update(paths) return {cid: tuple(sorted(paths)) for cid, paths in merged.items()} def ambiguous_config_mentions( - repo: Path, claims: list[Claim] + repo: Path, + claims: list[Claim], + config_index: dict[str, tuple[str, ...]] | None = None, ) -> dict[str, dict[str, tuple[str, ...]]]: """claim id -> {config key: defining files} for keys a LOAD-BEARING claim mentions that are defined in MORE THAN ONE tracked config file — the ambiguous case binding skips. Lets verify/bind-suggest surface the skip rather than guess. {} if none.""" - index, _ = config_key_index(repo) + index = config_index if config_index is not None else config_key_index(repo)[0] out: dict[str, dict[str, tuple[str, ...]]] = {} for claim in claims: if not claim.load_bearing or not isinstance(claim.text, str): @@ -323,7 +343,9 @@ def ambiguous_config_mentions( def ambiguous_symbol_mentions( - repo: Path, claims: list[Claim] + repo: Path, + claims: list[Claim], + definers: dict[str, tuple[str, ...]] | None = None, ) -> dict[str, dict[str, tuple[str, ...]]]: """claim id -> {symbol: defining files} for symbols a LOAD-BEARING claim's text mentions that are defined in MORE THAN ONE tracked file — the ambiguous case the @@ -331,10 +353,13 @@ def ambiguous_symbol_mentions( false precision). Lets `verify` / `bindings` surface that skip loudly and reviewably instead of hiding it; it never adds a watch. {} on a non-git repo. """ - try: - index = python_symbol_definers(repo) - except gitio.GitError: - return {} + if definers is None: + try: + index = python_symbol_definers(repo) + except gitio.GitError: + return {} + else: + index = definers out: dict[str, dict[str, tuple[str, ...]]] = {} for claim in claims: if not claim.load_bearing or not isinstance(claim.text, str): diff --git a/tests/test_symbol_index.py b/tests/test_symbol_index.py index dc5af15..671798f 100644 --- a/tests/test_symbol_index.py +++ b/tests/test_symbol_index.py @@ -51,6 +51,33 @@ def _warrant(repo: Path) -> Warrant: return Warrant.load(repo / "docs/design.md.warrant") +# --- TASK-005b: each whole-repo index is built at most once per verify ----------------- + + +def test_verify_builds_each_index_once(fixture_repo: Path, monkeypatch) -> None: + """The symbol/config indexes are pure functions of the repo tree, so a single + `dorian verify` must build each AT MOST once and thread it through every + consumer (was 2x python_symbol_definers + 3x config_key_index per verify).""" + calls = {"py": 0, "cfg": 0} + real_py = symbol_index.python_symbol_definers + real_cfg = symbol_index.config_key_index + + def spy_py(repo: Path): + calls["py"] += 1 + return real_py(repo) + + def spy_cfg(repo: Path): + calls["cfg"] += 1 + return real_cfg(repo) + + monkeypatch.setattr(symbol_index, "python_symbol_definers", spy_py) + monkeypatch.setattr(symbol_index, "config_key_index", spy_cfg) + + assert _verify(fixture_repo, [LOGIN_CLAIM]) == 0 # default path (no --binding-gate) + assert calls["py"] == 1, f"python_symbol_definers ran {calls['py']}x (want 1)" + assert calls["cfg"] == 1, f"config_key_index ran {calls['cfg']}x (want 1)" + + # --- Test A: watch the definer even when the claim text never names the file ---------- From 6d671ddd61f3043928f76fd3c90e12ed027ed494 Mon Sep 17 00:00:00 2001 From: Ajay Surya Date: Wed, 17 Jun 2026 09:14:23 +0530 Subject: [PATCH 06/18] fix(security): bound sqlite reconcile checks by a per-query timeout (TASK-014) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Typed C5 reads are deliberately not deny-exec-gated (only shell: is), so a pathological reconcile query the read-only authorizer permits — e.g. an infinite recursive CTE (SQLITE_RECURSIVE is a read op) — could hang the process even under --deny-exec. Install a progress handler with a monotonic wall-clock deadline (_SQLITE_QUERY_TIMEOUT_S, 5s) that interrupts the query; a deadline interrupt is distinguished from a genuine bad query via a timed_out flag and surfaces as ERROR(query_timeout), never FAIL/PASS. The read-only authorizer is preserved. Red-green test: a recursive CTE ERRORs in well under the bound with the cap patched tiny. Co-Authored-By: Claude Opus 4.8 (1M context) --- src/dorian/checkers/c5_data.py | 31 +++++++++++++++++++++++++++++++ tests/test_c5.py | 24 ++++++++++++++++++++++++ 2 files changed, 55 insertions(+) diff --git a/src/dorian/checkers/c5_data.py b/src/dorian/checkers/c5_data.py index c5754ac..48a7095 100644 --- a/src/dorian/checkers/c5_data.py +++ b/src/dorian/checkers/c5_data.py @@ -21,6 +21,7 @@ import csv import re import sqlite3 +import time from pathlib import Path from dorian.checkers import registry @@ -44,6 +45,10 @@ class _ColumnGone(Exception): pass +class _QueryTimeout(Exception): + pass + + _OPS = { "==": lambda a, b: a == b, "!=": lambda a, b: a != b, @@ -220,6 +225,13 @@ def _deny_non_read(op: int, *_: object) -> int: return sqlite3.SQLITE_OK if op in _SQLITE_READ_OPS else sqlite3.SQLITE_DENY +# typed C5 sqlite reads are deliberately NOT deny-exec-gated (only shell: is), so a +# pathological query — e.g. an infinite recursive CTE the read-only authorizer permits +# (SQLITE_RECURSIVE is a read op) — could otherwise hang the process even under +# --deny-exec. Bound every reconcile query by a wall-clock deadline -> ERROR on timeout. +_SQLITE_QUERY_TIMEOUT_S = 5.0 + + def _reconcile_side(ctx: CheckContext, side: str) -> int: engine, _, body = side.strip().partition(":") if engine == "csv": @@ -231,10 +243,27 @@ def _reconcile_side(ctx: CheckContext, side: str) -> int: raise _BadProgram(f"sqlite side expects '::