Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
3548754
docs(v1): correct live PyPI/rc2 doc drift + extend doc-drift guard (T…
ajay-dev-2112 Jun 17, 2026
498cb10
docs(v1): promote runnable hero demo + repoint Demo badge (TASK-002)
ajay-dev-2112 Jun 17, 2026
1fa09ee
fix(security): reject option-like pytest checker nodeids (TASK-005a)
ajay-dev-2112 Jun 17, 2026
7f59a1f
docs: add reproducible real cross-PR catch log (TASK-003)
ajay-dev-2112 Jun 17, 2026
30e68cf
perf(v1): build symbol/config indexes once per verify (TASK-005b)
ajay-dev-2112 Jun 17, 2026
6d671dd
fix(security): bound sqlite reconcile checks by a per-query timeout (…
ajay-dev-2112 Jun 17, 2026
e1bb89e
ci(security): pin actions to SHAs + add SCA/SAST + Dependabot (TASK-013)
ajay-dev-2112 Jun 17, 2026
135e5a9
feat: export a warrant as an in-toto ClaimVerification predicate (TAS…
ajay-dev-2112 Jun 17, 2026
cbc002f
docs: add guidance for writing checkable claims (TASK-006)
ajay-dev-2112 Jun 17, 2026
210b465
docs: sharpen the Claude Code warranted-claims recipe (TASK-008)
ajay-dev-2112 Jun 17, 2026
a0081d5
docs: add consolidated safe public-fork runner guidance (TASK-009)
ajay-dev-2112 Jun 17, 2026
0c15c0c
docs: reconcile public benchmark protocol with what shipped (TASK-012)
ajay-dev-2112 Jun 17, 2026
6350eae
feat: add deterministic suggest-claims scaffolding (TASK-016)
ajay-dev-2112 Jun 17, 2026
9461930
docs: wire the real-catch proof + new docs into the README (TASK-004)
ajay-dev-2112 Jun 17, 2026
fc9cfcd
docs: design note for broadening the public benchmark to C4+C5 (TASK-…
ajay-dev-2112 Jun 17, 2026
15b438e
style: ruff-format test_suggest_claims (TASK-016 follow-up)
ajay-dev-2112 Jun 17, 2026
81cebbc
release: prepare v1.0.1
ajay-dev-2112 Jun 17, 2026
814b046
docs: re-stamp BENCHMARK_CURRENT at 1.0.1 (numbers re-confirmed, unch…
ajay-dev-2112 Jun 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
version: 2
updates:
# Keep the SHA-pinned GitHub Actions fresh: Dependabot bumps each `@<sha> # vX`
# to the next version's SHA with an updated comment, so pinning does not freeze
# us on a stale (potentially vulnerable) action version.
- package-ecosystem: github-actions
directory: "/"
schedule:
interval: weekly
# Python deps (reads pyproject.toml: dev group + optional extras).
- package-ecosystem: pip
directory: "/"
schedule:
interval: weekly
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ jobs:
matrix:
python: ["3.11", "3.12", "3.13"]
steps:
- uses: actions/checkout@v6.0.3
- uses: astral-sh/setup-uv@v8.2.0
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
- uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # v8.2.0
with: { python-version: "${{ matrix.python }}" }
- run: uv sync --all-extras
- run: uv run ruff check src tests bench
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/public-microbench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,14 @@ jobs:
microbench:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6.0.3
- uses: astral-sh/setup-uv@v8.2.0
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
- uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # v8.2.0
with: { python-version: "3.12" }
- run: uv sync --all-extras

# Cache cloned public subjects keyed by the manifest's frozen SHAs, so a re-run is offline.
- name: Cache frozen subject checkouts
uses: actions/cache@v4
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4
with:
path: .bench/public-work/checkouts
key: public-microbench-checkouts-${{ hashFiles('bench/public/manifest.v1.yaml') }}
Expand All @@ -55,7 +55,7 @@ jobs:

- name: Upload results
if: always()
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
with:
name: public-microbench-results
path: bench/public/results/
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/publish-testpypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,17 @@ jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6.0.3
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
with:
ref: ${{ inputs.ref }}
- uses: actions/setup-python@v5
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
with: { python-version: "3.12" }
- name: Build sdist + wheel
run: |
python -m pip install --upgrade build twine
python -m build
python -m twine check dist/*
- uses: actions/upload-artifact@v4
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
with:
name: dist
path: dist/
Expand All @@ -51,11 +51,11 @@ jobs:
permissions:
id-token: write # OIDC: mint a short-lived token, no stored secret
steps:
- uses: actions/download-artifact@v4
- uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
name: dist
path: dist/
- name: Publish to TestPyPI (Trusted Publishing dry-run)
uses: pypa/gh-action-pypi-publish@release/v1
uses: pypa/gh-action-pypi-publish@cef221092ed1bacb1cc03d23a2d87d1d172e277b # v1.14.0
with:
repository-url: https://test.pypi.org/legacy/
10 changes: 5 additions & 5 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
ref: ${{ inputs.ref }}
- uses: actions/setup-python@v5
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
with:
python-version: "3.12"
- name: Build sdist + wheel
Expand All @@ -48,7 +48,7 @@ jobs:
PY
env:
REF: ${{ inputs.ref }}
- uses: actions/upload-artifact@v4
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
with:
name: dist
path: dist/
Expand All @@ -62,9 +62,9 @@ jobs:
permissions:
id-token: write # OIDC: mint a short-lived token, no stored secret
steps:
- uses: actions/download-artifact@v4
- uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
name: dist
path: dist/
- name: Publish to PyPI (Trusted Publishing)
uses: pypa/gh-action-pypi-publish@release/v1
uses: pypa/gh-action-pypi-publish@cef221092ed1bacb1cc03d23a2d87d1d172e277b # v1.14.0
12 changes: 6 additions & 6 deletions .github/workflows/release-gate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@ jobs:
matrix:
python: ["3.11", "3.12", "3.13"]
steps:
- uses: actions/checkout@v6.0.3
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
with:
ref: ${{ inputs.ref || github.ref }}
fetch-depth: 0
- uses: astral-sh/setup-uv@v8.2.0
- uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # v8.2.0
with: { python-version: "${{ matrix.python }}" }
- run: uv sync --all-extras
- run: uv run ruff check src tests bench
Expand All @@ -50,11 +50,11 @@ jobs:
id-token: write # OIDC for Sigstore signing — short-lived, no stored secret
attestations: write # write the build-provenance attestation
steps:
- uses: actions/checkout@v6.0.3
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
with:
ref: ${{ inputs.ref || github.ref }}
fetch-depth: 0
- uses: actions/setup-python@v5
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
with: { python-version: "3.12" }
- name: Build sdist + wheel
run: |
Expand All @@ -78,10 +78,10 @@ jobs:
run: |
cd dist && sha256sum * | tee SHA256SUMS
- name: Attest build provenance
uses: actions/attest-build-provenance@v1
uses: actions/attest-build-provenance@ef244123eb79f2f7a7e75d99086184180e6d0018 # v1
with:
subject-path: "dist/*.whl, dist/*.tar.gz"
- uses: actions/upload-artifact@v4
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
with:
name: dist
path: dist/
25 changes: 25 additions & 0 deletions .github/workflows/security.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: security
on:
push: { branches: [main] }
pull_request:
schedule:
- cron: "0 6 * * 1" # weekly Monday 06:00 UTC — catch newly-disclosed CVEs
# Least privilege: this job only reads the repo and queries advisory data.
permissions:
contents: read
jobs:
sca-sast:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
- uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # v8.2.0
with: { python-version: "3.12" }
- run: uv sync --all-extras
# SCA: audit the resolved dependency tree (dev + extras) for known CVEs.
# Runtime deps are [], so the value is the dev/extras transitive trees.
- name: pip-audit (SCA)
run: uvx pip-audit
# SAST: static analysis of first-party source. Excludes the documented,
# policy-gated execution primitives via [tool.bandit] in pyproject.toml.
- name: bandit (SAST)
run: uvx bandit -c pyproject.toml -r src/
124 changes: 76 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

<p>
<a href="#getting-started"><img src="https://img.shields.io/badge/Quickstart-2ea44f?style=for-the-badge" alt="Quickstart"></a>
<a href="#the-60-second-aha"><img src="https://img.shields.io/badge/Demo-1f6feb?style=for-the-badge" alt="Demo"></a>
<a href="#try-it-in-30-seconds"><img src="https://img.shields.io/badge/Demo-1f6feb?style=for-the-badge" alt="Demo"></a>
<a href="action/README.md"><img src="https://img.shields.io/badge/GitHub_Action-6e40c9?style=for-the-badge" alt="GitHub Action"></a>
</p>

Expand Down Expand Up @@ -42,6 +42,7 @@ now and is re-checked on every future change, so a confident summary doesn't qui

## Table of contents

- [Try it in 30 seconds](#try-it-in-30-seconds)
- [The 60-second aha](#the-60-second-aha)
- [We ran this on dorian itself](#we-ran-this-on-dorian-itself)
- [About](#about)
Expand All @@ -60,10 +61,42 @@ now and is re-checked on every future change, so a confident summary doesn't qui
- [License](#license)
- [Contact](#contact)

## Try it in 30 seconds

A self-contained run on a throwaway repo — copy-paste it; it leaves nothing behind but a
temp directory. (This exact sequence is pinned by a black-box test, so it is executable and
kept working, not just illustrative.)

```bash
tmp=$(mktemp -d) && cd "$tmp" && git init -q
printf 'def handler():\n return 200\n' > app.py
printf '# change note\n\n`handler()` lives in app.py.\n' > note.md
git add -A && git commit -q -m "app + note"

cat > claims.json <<'JSON'
{"claims": [
{"id": "handler-exists", "text": "handler() lives in app.py.",
"kind": "behavior", "load_bearing": true,
"checkers": [{"type": "C3", "program": "symbol:app.py::handler"}]}
]}
JSON

dorian verify note.md --claims claims.json # -> verified 1/1 claim(s) (exit 0)

# now a refactor renames the function the note claims exists:
printf 'def renamed():\n return 200\n' > app.py
dorian revalidate --since HEAD # -> handler-exists BROKEN; WARRANTED -> REVOKED (exit 4)
```

`note.md` never changed and `git`/CI stay quiet — but the warrant flips to REVOKED, naming
the exact claim that stopped being true. (Don't have `dorian` yet? See
[Getting started](#getting-started).)

## The 60-second aha

An agent finishes a change and emits the claims it just made — a `claims.json` next to the work,
each claim bound to a read-only deterministic checker:
*(Illustrative — these files are not in your checkout; run the copy-paste demo above to try it
yourself.)* An agent finishes a change and emits the claims it just made — a `claims.json` next to
the work, each claim bound to a read-only deterministic checker:

```json
{
Expand Down Expand Up @@ -121,6 +154,16 @@ those claims named made `dorian revalidate` flag exactly that claim `BROKEN` and
a committed artifact and not a benchmark figure — but it is evidence that the mechanism can catch
this kind of checked break on real code, for zero model tokens.

We have since recorded a **documented, reproducible cross-PR catch on a public repo**. A
load-bearing claim sealed against [`encode/httpx`](https://github.com/encode/httpx) at one
commit — `requires-python` is `">=3.8"` — was flipped `WARRANTED → REVOKED` (exit 4) by a real
*later* upstream PR ([#3592](https://github.com/encode/httpx/pull/3592), "Drop Python 3.8
support", which moved it to `">=3.9"`), while httpx's own test suite stayed green (no test
references `requires-python`) and no stateless per-PR review bot would have re-opened the
original claim. The full command output and a from-scratch reproduction on the public repo are
in [`docs/REAL_CATCH_LOG.md`](docs/REAL_CATCH_LOG.md) — one documented catch, with honest
scope, not a validation claim.

## About

An AI agent writes the code and then a confident account of what it did — a PR description, a commit
Expand Down Expand Up @@ -286,8 +329,15 @@ rebuildable at any time with `dorian sync` — and is never committed.

## Getting started

The distribution is `dorian-vwp`; the import and CLI are `dorian`. The first PyPI release is on the
roadmap — until it lands, install from source:
The distribution is `dorian-vwp`; the import and CLI are `dorian`. Install from PyPI:

```bash
pip install dorian-vwp # core, zero runtime dependencies
pip install 'dorian-vwp[data]' # + duckdb for parquet data claims
pip install 'dorian-vwp[extract]' # + anthropic for LLM claim drafting (frozen/experimental)
```

To install the latest unreleased changes, install from source instead:

```bash
pip install 'dorian-vwp @ git+https://github.com/ajaysurya1221/dorian.git'
Expand All @@ -297,14 +347,6 @@ pip install 'dorian-vwp[data] @ git+https://github.com/ajaysurya1221/dorian.git'
pip install 'dorian-vwp[extract] @ git+https://github.com/ajaysurya1221/dorian.git' # + anthropic for LLM claim drafting (frozen/experimental)
```

After the first PyPI release:

```bash
pip install dorian-vwp # core, zero runtime dependencies
pip install 'dorian-vwp[data]' # + duckdb for parquet data claims
pip install 'dorian-vwp[extract]' # + anthropic for LLM claim drafting (frozen/experimental)
```

Then run `dorian verify <artifact> --claims claims.json` on one change. For CI, add the composite
[GitHub Action](action/README.md) — it revalidates the claims a pull request touches and posts a
sticky PR comment. **Read its
Expand Down Expand Up @@ -334,35 +376,8 @@ jobs:
install: 'dorian-vwp @ git+https://github.com/ajaysurya1221/dorian.git'
```

### Try it in 30 seconds

A self-contained run on a throwaway repo — copy-paste it; it leaves nothing behind but a
temp directory. (This exact sequence is pinned by a black-box test, so it is executable and
kept working, not just illustrative.)

```bash
tmp=$(mktemp -d) && cd "$tmp" && git init -q
printf 'def handler():\n return 200\n' > app.py
printf '# change note\n\n`handler()` lives in app.py.\n' > note.md
git add -A && git commit -q -m "app + note"

cat > claims.json <<'JSON'
{"claims": [
{"id": "handler-exists", "text": "handler() lives in app.py.",
"kind": "behavior", "load_bearing": true,
"checkers": [{"type": "C3", "program": "symbol:app.py::handler"}]}
]}
JSON

dorian verify note.md --claims claims.json # -> verified 1/1 claim(s) (exit 0)

# now a refactor renames the function the note claims exists:
printf 'def renamed():\n return 200\n' > app.py
dorian revalidate --since HEAD # -> handler-exists BROKEN; WARRANTED -> REVOKED (exit 4)
```

`note.md` never changed and `git`/CI stay quiet — but the warrant flips to REVOKED, naming
the exact claim that stopped being true.
Now that `dorian` is installed, the copy-paste runnable demo at the top —
[Try it in 30 seconds](#try-it-in-30-seconds) — runs end to end against a throwaway repo.

## Writing claims an agent can be held to

Expand All @@ -374,13 +389,17 @@ shape-tolerant checks like `regex:`/`symbol:`/typed-C5 over brittle `string:`)
path/symbol/string/regex plus the V1 structural forms `py-signature:`/`py-const:` and the
comment/docstring-stripped `code:`, C4 `pytest:<nodeid>`, C5 typed data) are documented in
[`spec/checkers.md`](spec/checkers.md). What V1 strengthening does and does not promise is in
[`docs/V1_SCOPE.md`](docs/V1_SCOPE.md).
[`docs/V1_SCOPE.md`](docs/V1_SCOPE.md). Worked good/bad claim pairs — and the gutted-body
ceiling, where an existence check is too weak and you need a C4/C5 behavior check — are in
[`docs/WRITING_GOOD_CLAIMS.md`](docs/WRITING_GOOD_CLAIMS.md).

> **Checker programs are executable.** `dorian verify` *runs* every checker at seal time. C3 and typed
> C5 only inspect files, but C4 (`pytest:`) and C5 `shell:` execute code — review an agent-emitted
> `claims.json` exactly as you would review agent-emitted code, and never run `verify` on claims from
> an untrusted source. In untrusted contexts add `--deny-exec` to refuse the executable families
> (fail-closed, not a sandbox — see [SECURITY.md](SECURITY.md)).
> (fail-closed, not a sandbox — see [SECURITY.md](SECURITY.md)). For one copy-paste safe recipe for
> public/untrusted fork PRs (`checker_trust: base` + `deny_exec`), see
> [`docs/SECURITY_AND_SAFE_RUNNERS.md`](docs/SECURITY_AND_SAFE_RUNNERS.md).

## Command surface

Expand Down Expand Up @@ -416,6 +435,14 @@ claims.
refuses the re-seal (exit 4) rather than being laundered into a fresh trusted state.
- `dorian suggest-data-checks <path> [--columns ...] [--out f]` — born-verifiable C5 checker
suggestions from a data file's current state, for review and pasting into a claim's `checkers` list.
- `dorian suggest-claims <path.py> [--out f]` — born-verifiable C3 claim suggestions (`symbol:` for
defs/classes, `py-const:` for literal constants) for a Python file: each candidate is run and only
passing ones are emitted, `load_bearing` defaults to false, ambiguous symbols are skipped. Review
scaffolding (existence/value, not behavior) — see
[`docs/design/SUGGEST_CLAIMS.md`](docs/design/SUGGEST_CLAIMS.md).
- `dorian export --in-toto <artifact>` — project a sealed `.warrant` into an experimental in-toto
`ClaimVerification` Statement (deterministic, no signing, zero deps); experimental interop —
see [`docs/ATTESTATION_INTEROP.md`](docs/ATTESTATION_INTEROP.md).
- `dorian report --audit` — the full event log as `dorian-audit-v1` JSONL, byte-identical across
runs; checker details truncated to 160 chars to bound source-content carryover.
- `dorian revalidate --format md|json` — `md` is the PR-comment body posted by the
Expand Down Expand Up @@ -464,8 +491,9 @@ work perishable, so you find out when it expired.

## Roadmap

- **Real catches on real repos** — the dogfood above made the loop usable; next is using it daily and
recording the breaks it catches that would otherwise have shipped.
- **Real catches on real repos** — the loop is usable and the first documented cross-PR catch is
recorded ([`docs/REAL_CATCH_LOG.md`](docs/REAL_CATCH_LOG.md), on `encode/httpx`); next is using it
daily and recording more of the breaks it catches that would otherwise have shipped.
- **The binding gap, narrowed and measured** — a symbol→defining-file index now re-checks a claim
when its symbol's definer changes, closing the silent-skip *trigger* gap
([`docs/BENCHMARK_BINDING_LIFECYCLE.md`](docs/BENCHMARK_BINDING_LIFECYCLE.md)). What remains is the
Expand All @@ -479,8 +507,8 @@ work perishable, so you find out when it expired.
([`docs/BENCHMARK_PUBLIC_REAL_REPOS.md`](docs/BENCHMARK_PUBLIC_REAL_REPOS.md)). These are
**reproducible on those frozen SHAs only** — not a real-world performance claim; the trigger and
truth layers are reported separately.
- **PyPI trusted publishing** — tagged releases now ship (latest: **`v1.0.0rc2`**, a V1 release
candidate / prerelease); publishing `dorian-vwp` to PyPI via a Trusted Publisher is next.
- **PyPI trusted publishing** — `dorian-vwp` is published to PyPI via a Trusted Publisher
(latest: **`v1.0.0`**); `pip install dorian-vwp` installs the released package.

Non-goals stay non-goals: no servers, no dashboards, no hosted control plane, no model at check time.
Local-first is the design center.
Expand Down
Loading