Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,14 @@ bench/real/
.DS_Store
/assets/
.env

# tool working dirs (not release content)
.claude/
.gitnexus/

# internal program/audit working docs — provenance only, never shipped in the release
/RESEARCH_REPORT_DORIAN_0_11_0.md
/V1_IMPLEMENTATION_TRACKER.md
/V1_ALIGNMENT_REPORT.md
/AUDIT_RELEASE_GATE.md
/GITHUB_RELEASE_NOTES.md
46 changes: 34 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,9 @@ now and is re-checked on every future change, so a confident summary doesn't qui
> commits, nothing else — with **zero model tokens at check time**, so the checker can't be talked
> past by the code it verifies. Because checker programs are *executable* (C4 runs `pytest`, C5
> `shell:` runs a command), it is built for **trusted, internal repositories** — not public CI
> taking forked pull requests. Pairs naturally with a coding agent such as **Claude Code**
> ([how](#using-dorian-with-claude-code)).
> taking forked pull requests by default (for public/fork PRs, `checker_trust: base` runs only
> base-approved checker specs — a trust root, still not a sandbox). Pairs naturally with a coding
> agent such as **Claude Code** ([how](#using-dorian-with-claude-code)).

## Table of contents

Expand Down Expand Up @@ -103,6 +104,12 @@ fold sha256:7920c71b5a6a9c8e WARRANTED -> REVOKED
The summary still reads perfectly. Its portrait flipped to **REVOKED** — and every artifact whose
warrant was built on it is flagged `recalled`, so nobody builds on a claim that silently went false.

> **Trust states.** A warrant is born **WARRANTED**. Each `revalidate` folds it to **TRUSTED**
> (all re-checked claims hold), **DEGRADED** or **REVOKED** (a claim broke — DEGRADED for a
> non-load-bearing break, REVOKED for a load-bearing one), or **UNKNOWN** (a checker could not
> run — ERROR is never silently green and never counted as broken). So `WARRANTED -> REVOKED`
> above is the born state folding on its first revalidation.

## We ran this on dorian itself

The `verify` and `revalidate` output above is exactly what dorian prints, shown for an illustrative
Expand Down Expand Up @@ -169,7 +176,10 @@ path-scope watcher (58 → 5 false alarms) and **10.4x** versus the stronger lin
1.00 by construction here; the meaningful axis is their precision.)

These numbers describe a synthetic fixture suite, not your repository, and are not a universal
performance claim. See [`docs/BENCHMARK_v0.7.0.md`](docs/BENCHMARK_v0.7.0.md) (protocol:
performance claim. The headline figures were **measured at v0.7.0** and are **historical**; the
current version reproduces them unchanged (240 pairs, P=R=0.93) — see the version-stamped
[`docs/BENCHMARK_CURRENT.md`](docs/BENCHMARK_CURRENT.md). See
[`docs/BENCHMARK_v0.7.0.md`](docs/BENCHMARK_v0.7.0.md) (protocol:
[`docs/BENCHMARK_PROTOCOL_v0.7.0.md`](docs/BENCHMARK_PROTOCOL_v0.7.0.md)); reproduce with
`dorian bench large-mutation`, and measure your own repos with the harness in `bench/`.

Expand Down Expand Up @@ -207,6 +217,8 @@ trigger-vs-truth ceiling, on a real class (**partial**). Two further cases (docu
sources, not reproduced) are honest misses (**not_solved**). These are scoped reproductions of public
problem classes — not universal validation.

The 808-pair figures above were **measured at dorian 0.9.0** and are **historical**; the
current-version rerun (same protocol) is in [`docs/BENCHMARK_CURRENT.md`](docs/BENCHMARK_CURRENT.md).
See [`docs/BENCHMARK_BINDING_LIFECYCLE.md`](docs/BENCHMARK_BINDING_LIFECYCLE.md) and
[`docs/REALWORLD_USECASES.md`](docs/REALWORLD_USECASES.md) (protocols alongside each); reproduce with
`dorian bench binding-lifecycle` and `dorian bench realworld-usecases`.
Expand Down Expand Up @@ -359,8 +371,10 @@ A warrant is worth only what its checkers actually catch. The full authoring con
load-bearing claim, **bind** the file that would change if the claim went false, **prefer**
shape-tolerant checks like `regex:`/`symbol:`/typed-C5 over brittle `string:`) — lives in
[`docs/AGENT_CLAIMS.md`](docs/AGENT_CLAIMS.md). Checker program grammars (C1 span, C3
path/symbol/string/regex, C4 `pytest:<nodeid>`, C5 typed data) are documented in
[`spec/checkers.md`](spec/checkers.md).
path/symbol/string/regex plus the V1 structural forms `py-signature:`/`py-const:` and the
comment/docstring-stripped `code:`, C4 `pytest:<nodeid>`, C5 typed data) are documented in
[`spec/checkers.md`](spec/checkers.md). What V1 strengthening does and does not promise is in
[`docs/V1_SCOPE.md`](docs/V1_SCOPE.md).

> **Checker programs are executable.** `dorian verify` *runs* every checker at seal time. C3 and typed
> C5 only inspect files, but C4 (`pytest:`) and C5 `shell:` execute code — review an agent-emitted
Expand All @@ -386,12 +400,16 @@ claims.
event: a flag only — downstream is never re-checked and its states are untouched. Re-seal with
`seal --supersede <old-id>` so downstream warrants sealed against the old id stay reachable.
- `dorian bindings <artifact>` — binding-quality diagnostics (unbacked, single-file, short-literal,
ambiguous-mention, trigger-only-symbol, unwatched-mention). Informational, never a gate; output
carries file paths only, never matched content. `ambiguous-mention` surfaces a load-bearing claim
whose symbol is defined in more than one file (so no definer is auto-watched); `trigger-only-symbol`
marks a watch added only as a re-check *trigger* that no checker actually exercises.
- `dorian bind-suggest --claims claims.json` — read-only preview of the symbol-definer files `verify`
would auto-bind for each claim (and the ambiguous symbols it would skip). Writes nothing, never a gate.
ambiguous-mention, trigger-only-symbol, unwatched-mention) **plus per-claim checker-strength and
claim-risk** (it classifies each checker's *truth strength* and flags adequacy mismatches — a
`behavior` claim backed only by an existence checker, a vacuous pytest node). Informational, never a
gate; output carries file paths only, never matched content.
- `dorian bind-suggest --claims claims.json` — read-only preview of the files `verify` would auto-bind
for each claim, **with provenance** (symbol-definer vs config-key), the ambiguous symbols/keys it
would skip, and any unparseable config file. Writes nothing, never a gate.
- `dorian revalidate --checker-source base` (also Action `checker_trust: base`; default `head`) —
resolve each claim's checker spec from the `--since` base ref so a PR-added or PR-modified executable
checker is never executed (public/fork PRs). Fail-closed, **not a sandbox** — pair with `--deny-exec`.
- `dorian rebind <artifact>` — re-derive a warrant's symbol-definer watches with the current binding
logic and re-seal it (born-verifiable, superseding the old id), so a warrant sealed before the symbol
index existed gains the wider watches. The watch only ever widens; a claim that has since become false
Expand Down Expand Up @@ -420,6 +438,9 @@ claims.
benchmark for symbol binding ([`docs/BENCHMARK_BINDING_LIFECYCLE.md`](docs/BENCHMARK_BINDING_LIFECYCLE.md)).
`dorian bench realworld-usecases` runs the offline public-case reproductions
([`docs/REALWORLD_USECASES.md`](docs/REALWORLD_USECASES.md)).
- `dorian bench warrant-quality <artifact>` — offline per-claim mutation scoring: for each claim, does
its checker catch the drift it implies (caught / missed / brittle / ceiling)? Deterministic, never
mutates the real repo. Separates trigger from verdict; see [`docs/V1_SCOPE.md`](docs/V1_SCOPE.md).

Exit codes: `0` ok/TRUSTED · `2` usage/infra (incl. a C1 or C5 `shell:` claim handed to `verify`) ·
`3` DEGRADED · `4` REVOKED/integrity · `5` ERRORED-only (checkers could not run — never conflated with
Expand Down Expand Up @@ -455,7 +476,8 @@ work perishable, so you find out when it expired.
([`docs/REALWORLD_USECASES.md`](docs/REALWORLD_USECASES.md)) reproduce real problem *classes*; the
next rung is frozen public-repo SHAs with manual claims and reproducible known-truth labels
([`docs/SOLO_VALIDATION_LADDER.md`](docs/SOLO_VALIDATION_LADDER.md)).
- **Tagged release and PyPI trusted publishing.**
- **PyPI trusted publishing** — tagged releases now ship (latest: **`v1.0.0rc1`**, a V1 release
candidate / prerelease); publishing `dorian-vwp` to PyPI via a Trusted Publisher is next.

Non-goals stay non-goals: no servers, no dashboards, no hosted control plane, no model at check time.
Local-first is the design center.
Expand Down
23 changes: 15 additions & 8 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,14 +64,21 @@ inside your own sandbox (container, restricted user, no secrets in env).

## Public fork PR CI

dorian does **not** currently advertise a safe public-fork-PR mode. A trusted-base
Action design (run checkers from the base ref, never from untrusted head, deny-exec
by default for forks) is documented in
[docs/TRUSTED_BASE_ACTION_DESIGN.md](docs/TRUSTED_BASE_ACTION_DESIGN.md) but is **not
yet implemented or tested**. Until it is, the safe answer for public forks is
`--deny-exec` plus the standard caution that any executed checker still runs with
the runner's privileges. Do not wire dorian into `pull_request_target` with a
checkout of untrusted head.
For public/forked-PR CI, use **trusted-base checker-source mode**:
`dorian revalidate --checker-source base` (Action input `checker_trust: base`). It
resolves each claim's checker SPEC from the trusted base ref and runs it against the
PR-head sources, so a PR-added or PR-modified executable checker is never executed and
a PR rewriting a checker spec cannot self-attest a verdict (the base-approved spec
wins). A missing or tampered base sidecar **fails closed** (ERRORED, never executed).
This is implemented and proven by the test matrix in
[docs/TRUSTED_BASE_ACTION_DESIGN.md](docs/TRUSTED_BASE_ACTION_DESIGN.md) §6
(`tests/test_trusted_base.py`).

It is a **checker-source trust root, not a sandbox**: a base-approved `pytest:` checker
can still import and execute PR-head code. So for fully untrusted forks, combine
`checker_trust: base` **with `deny_exec: true`** (or external isolation) — any executed
checker still runs with the runner's privileges. Do not wire dorian into
`pull_request_target` with a checkout of untrusted head.

## Supported versions

Expand Down
76 changes: 47 additions & 29 deletions action/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,20 +59,21 @@ caveats:
1. A `.warrant` file is a **non-obvious executable input**. Reviewers who
would scrutinize a workflow or `conftest.py` change may wave through a
"docs-only" diff that swaps a checker `program`.
2. The verdict is **self-attested by the PR tree**. A PR can rewrite a
sidecar so a broken claim re-verifies; the trust root for what "should"
be checked is not yet the base branch.

**deny-exec input (partial mitigation, available now).** Set `deny_exec: true`
(or `deny_shell: true`) on the Action to refuse the executable checker families
during revalidation: C4 pytest and C5 shell ERROR instead of executing, so a
PR-authored sidecar cannot make this Action run its code. It flows through the
`DORIAN_DENY_EXEC` env fallback; the default `false` preserves today's behavior
for trusted/internal repos. This is fail-closed but **not a sandbox** and **not
yet a full public-fork story**: it removes code execution but does not address
the self-attested-verdict problem (a PR can still rewrite a *non-executable* C3
claim so a broken fact re-verifies). See `SECURITY.md` and
`docs/SECURITY_BOUNDARY.md`.
2. In the default `head` mode the verdict is **self-attested by the PR tree** — a
PR can rewrite a sidecar so a broken claim re-verifies. **`checker_trust: base`
fixes exactly this** (see below): it sources every checker spec from the base
ref, so a PR rewriting a spec can no longer weaken the verdict. Use `head` only
for trusted/internal repos.

**deny-exec input.** Set `deny_exec: true` (or `deny_shell: true`) on the Action to
refuse the executable checker families during revalidation: C4 pytest and C5 shell
ERROR instead of executing, so a PR-authored sidecar cannot make this Action run its
code. It flows through the `DORIAN_DENY_EXEC` env fallback; the default `false`
preserves today's behavior for trusted/internal repos. This is fail-closed but **not
a sandbox**: on its own it removes code execution but does not address the
self-attested-verdict problem for *non-executable* checkers — that is what
`checker_trust: base` adds, and the two compose (use both for untrusted forks). See
`SECURITY.md` and `docs/SECURITY_BOUNDARY.md`.

```yaml
# untrusted / public-fork posture
Expand All @@ -81,16 +82,30 @@ claim so a broken fact re-verifies). See `SECURITY.md` and
deny_exec: "true" # C4/C5 ERROR instead of executing
```

**Current recommendation: trusted/internal repositories.** Until a
trusted-base mode exists (execute only checker specs already present on the
base branch; parse/lint — never execute — new or changed PR sidecars; skip
C5 `shell:` and other executable checkers in untrusted mode — designed in
[`docs/TRUSTED_BASE_ACTION_DESIGN.md`](../docs/TRUSTED_BASE_ACTION_DESIGN.md),
not yet implemented), this Action is recommended for repositories where
everyone who can open a PR is already trusted to run code in CI, or with
`deny_exec: true` for untrusted PRs. For public repositories, treat any PR that
touches a `.warrant` file as a code change requiring the same review as a CI
change.
**trusted-base mode (`checker_trust: base`).** This is the trust-root fix for the
self-attested-verdict problem. With `checker_trust: base`, the Action resolves each
claim's checker SPEC from the **base ref** and runs it against the PR-head sources, so
a PR-added or PR-modified executable checker is never executed and a rewritten checker
cannot self-attest a verdict — the base-approved spec wins, and the change is surfaced
in the PR comment. A missing or tampered base sidecar **fails closed** (ERRORED, never
executed). Implemented and proven by the
[trusted-base test matrix](../docs/TRUSTED_BASE_ACTION_DESIGN.md).

```yaml
# public / forked-PR posture: trusted checker specs + no code execution
- uses: ajaysurya1221/dorian/action@main
with:
checker_trust: base # run only base-approved checker specs
deny_exec: "true" # and refuse to execute even those (belt and braces)
```

**It is a checker-source trust root, not a sandbox.** A base-approved `pytest:` checker
can still import and execute PR-head code, so for fully untrusted forks combine
`checker_trust: base` **with** `deny_exec: true` (or external isolation). Default
`checker_trust: head` is unchanged and correct for trusted/internal repositories, where
everyone who can open a PR is already trusted to run code in CI. For public repositories,
treat any PR that touches a `.warrant` file as a code change requiring the same review as
a CI change.

Hard rules either way:

Expand All @@ -107,11 +122,14 @@ Hard rules either way:

## Inputs

| input | default | meaning |
| --------- | -------------------------------------------- | ------------------------------------------------------------------------ |
| `fail_on` | `revoked` | when to fail the step: `revoked` (exit 4 only), `degraded` (3 or 4), `never` |
| `base` | `${{ github.event.pull_request.base.sha }}` | git ref passed to `dorian revalidate --since` |
| `install` | `dorian-vwp` | pip spec; pin `dorian-vwp==0.6.*`, or `.` for checkout installs |
| input | default | meaning |
| --------------- | -------------------------------------------- | ------------------------------------------------------------------------ |
| `fail_on` | `revoked` | when to fail the step: `revoked` (exit 4 only), `degraded` (3 or 4), `never` |
| `base` | `${{ github.event.pull_request.base.sha }}` | git ref passed to `dorian revalidate --since` |
| `install` | `dorian-vwp` | pip spec; until the first PyPI release use the git source spec (below), or `.` for checkout installs |
| `deny_exec` | `false` | refuse to run executable checkers (C4 pytest, C5 shell): they ERROR. For untrusted/fork PRs; fail-closed, not a sandbox |
| `deny_shell` | `false` | narrower than `deny_exec`: block only C5 shell, still allow C4 pytest |
| `checker_trust` | `head` | `head` runs the checked-out checker spec (trusted repos); `base` runs the base-ref spec so PR-authored executable checkers never run (public/fork PRs) |

Until the first PyPI release of `dorian-vwp`, set `install` to a source spec:
`install: 'dorian-vwp @ git+https://github.com/ajaysurya1221/dorian.git'`.
Expand Down
19 changes: 17 additions & 2 deletions action/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,9 @@ inputs:
default: ${{ github.event.pull_request.base.sha }}
install:
description: >-
pip requirement spec for dorian. Pin a release ('dorian-vwp==0.6.*')
or pass '.' to install the checked-out source.
pip requirement spec for dorian. Until the first PyPI release, use a git
source spec ('dorian-vwp @ git+https://github.com/ajaysurya1221/dorian.git'),
pass '.' to install the checked-out source, or pin a tag once published.
required: false
default: dorian-vwp
deny_exec:
Expand All @@ -41,6 +42,17 @@ inputs:
Narrower than deny_exec: block only C5 shell, still allow C4 pytest.
required: false
default: "false"
checker_trust:
description: >-
Which sidecar a claim's checker SPEC is read from (the sources checked are
always the PR head). 'head' (default) runs the checked-out spec — correct for
trusted/internal repos. 'base' resolves each spec from the base ref, so a
PR-added or PR-modified executable checker is never executed and a rewritten
checker cannot self-attest a verdict — for public repos taking forked PRs.
Fail-closed, NOT a sandbox: a base-approved pytest checker can still execute
PR-head code. See docs/TRUSTED_BASE_ACTION_DESIGN.md.
required: false
default: head

runs:
using: composite
Expand All @@ -64,6 +76,9 @@ runs:
# unchanged. Set deny_exec: true for untrusted/fork PRs.
DORIAN_DENY_EXEC: ${{ inputs.deny_exec }}
DORIAN_DENY_SHELL: ${{ inputs.deny_shell }}
# checker_trust=base resolves checker specs from the base ref so a
# PR-authored executable checker never runs. 'head' (default) is unchanged.
DORIAN_CHECKER_SOURCE: ${{ inputs.checker_trust }}
run: |
set +e
dorian sync
Expand Down
Loading