Skip to content

release: merge dorian 1.0.0#7

Merged
ajaysurya1221 merged 3 commits into
mainfrom
week1-public-microbench
Jun 16, 2026
Merged

release: merge dorian 1.0.0#7
ajaysurya1221 merged 3 commits into
mainfrom
week1-public-microbench

Conversation

@ajaysurya1221

@ajaysurya1221 ajaysurya1221 commented Jun 16, 2026

Copy link
Copy Markdown
Owner

Merges the remotely gated dorian v1.0.0 release branch into main.

  • Final tag v1.0.0 peels to 2f1053431d8ed826556f1af6207344d5148a45a9 (version triad 1.0.0).
  • Remote release-gate.yml run 27604334644 is green: 3.11/3.12/3.13 test matrix + build-attest (SHA-256 + Sigstore build-provenance attestation).
  • GitHub Release published (non-prerelease) with wheel, sdist, and SHA256SUMS.
  • Fast-forward to main was not possible (branches diverged at 79136d5), so this PR is the non-destructive merge path.

PyPI publishing is a separate, deliberate action and is not part of this PR.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added config-value: checker form for type-aware TOML/JSON configuration assertions.
    • Added public real-repository micro-benchmark harness with machine-derived structural claims validation.
  • Chores

    • Bumped version to 1.0.0.
    • Enhanced release workflows with security hardening, build provenance attestations, and trusted publishing.
    • Refactored state persistence to use atomic transactions, preventing incomplete updates after crashes.
    • Established comprehensive test coverage for benchmark infrastructure and release gate logic.

ajay-dev-2112 and others added 3 commits June 16, 2026 10:02
…e + release hardening

Weeks 1-4 toward 1.0.0, on top of v1.0.0rc1.

Verification engine (Week 2):
- config-value: C3 structural checker (TOML/JSON, stdlib-only, type-aware), spec'd
- fold audit/state made one atomic transaction (store.apply_*_atomic)

Public benchmark (Weeks 1-3):
- bench/public_repos.py harness (frozen-SHA clone, seal->mutate->revalidate,
  trigger vs truth reported separately, fail-closed cache verify, honesty guard)
- bench/public_claims.py machine-derived claim synthesizer: operands extracted
  from source (ast/tomllib/json), known_truth OBSERVED by running the checker on
  the mutated copy (Chain-of-Verification), auto-rejects non-literal/ERROR targets
- executed humanize + python-dotenv at frozen SHAs: 4 machine-derived claims each
  (3 BROKEN + 1 TRUSTED control), byte-deterministic across two runs, 8/8 matched

Release gate (Weeks 3-4):
- bench/release_state.py: deterministic S0-S11 state machine; pure evaluate() +
  IO-isolated gather_facts(); emits one decision (PROMOTE/CUT_RC2/STAY_RC/HALT_*);
  --strict fails closed on missing evidence; decision on this tree = CUT_RC2_READY
  (new behavior landed after the rc1 tag)
- .github/workflows/release-gate.yml (build + 3.11/3.12/3.13 matrix + sha256 +
  artifact attestation), publish-testpypi.yml (TestPyPI Trusted-Publishing dry-run),
  ci.yml least-privilege permissions
- tests/test_action_security.py locks the Action log-injection fence, no
  pull_request_target, least-privilege scopes
- docs/{RELEASE_GATE_1_0,RELEASE_DECISION_1_0,BENCHMARK_PUBLIC_REAL_REPOS}.md

771 non-slow tests green; ruff + format clean; zero runtime deps added; version
files untouched. No tag/release/publish performed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a config-value: C3 checker for type-strict TOML/JSON key-path assertions, refactors store.py/fold.py to commit state changes and audit events atomically, introduces bench/public_claims.py for deterministic claim synthesis, bench/public_repos.py for a frozen-SHA mutation benchmark runner, and bench/release_state.py for an S0–S10 release-state evaluator. Three new GitHub Actions workflows (release-gate, public-microbench, publish-testpypi) and a security regression test suite are added. The package version is bumped to 1.0.0.

Changes

config-value checker and atomic state persistence

Layer / File(s) Summary
config-value C3 checker: spec, implementation, strength, and seal
spec/checkers.md, src/dorian/checkers/c3_ref.py, src/dorian/strength.py, src/dorian/seal.py, tests/test_config_value.py
Documents and implements config-value:<path>:<dotted.key.path>:<json-literal> with type-strict TOML/JSON comparison, registers structural strength, fixes path extraction in _derive_watch/referenced_paths for the new three-segment grammar, and validates all PASS/FAIL/ERROR branches.
Atomic claim/trust state + event persistence
src/dorian/store.py, src/dorian/fold.py, tests/test_fold.py
Extracts non-committing SQL helpers and adds apply_claim_state_atomic/apply_trust_state_atomic; updates fold.py to call these; tests verify combined commit and rollback-on-failure.

Public benchmark harness, release state evaluator, and CI/CD pipeline

Layer / File(s) Summary
Benchmark data models, manifest, and pinned artifacts
bench/public_repos.py (models/constants), bench/public/manifest.v1.yaml, bench/public/repos/...
Defines RepoSpec, Mutation, error types, and protocol constants; adds the v1 manifest pinning six repos with eligibility flags; includes benchmark-ready and draft claims, mutations, targets, and rejected arrays for humanize, python-dotenv, bandit, tomli, and jaffle_shop_duckdb, plus manifest contract tests.
Claim synthesizer
bench/public_claims.py, tests/test_bench_public_claim_synthesis.py
Implements extract_operand, synthesize, _synthesize_one, write_generated, and CLI; runs real C3 against a temp one-file repo, enforces exactly-once mutation and literal-only invariants, promotes to benchmark-ready only when all targets pass; tests cover promotion, rejection reasons, determinism, and provenance tagging.
Benchmark runner: seal→mutate→revalidate loop, checkout, and output
bench/public_repos.py (execution core/checkout/CLI), tests/test_bench_public_harness.py
Restores clean state, seals at frozen SHA, applies each mutation exactly once, maps revalidate to trigger/truth layers, resolves checkouts with fail-closed SHA verification; tests cover truth/trigger accounting, ERRORED vs BROKEN, deny_exec skip, C4/C5 gating, and cache reuse.
Overclaim guard, report rendering, and benchmark docs
bench/public_repos.py (report/guard), docs/BENCHMARK_PUBLIC_REAL_REPOS.md, tests/test_bench_public_report.py
Implements guard_overclaim and render_report with enforced required framing and forbidden-phrase rejection; documents harness results with frozen-SHA reproducibility framing; tests verify required wording, forbidden phrase absence, and that the published doc passes the honesty guard.
Release state evaluator: S0–S10 machine, fact gathering, and CLI
bench/release_state.py, tests/test_release_state.py, docs/RELEASE_GATE_1_0.md, docs/RELEASE_DECISION_1_0.md
Adds evaluate() running S0–S10 checks, gather_facts() for checkout-only fact collection, render_json/render_decision_md, and main(); tests cover all decision paths, RC tag auto-detection, determinism invariants, and helper unit behavior.
GitHub Actions workflows, security regression tests, version bump
.github/workflows/*.yml, tests/test_action_security.py, src/dorian/__init__.py, pyproject.toml, .gitignore, README.md, docs/BENCHMARK_CURRENT.md
Adds release-gate.yml, public-microbench.yml, publish-testpypi.yml, and least-privilege ci.yml; adds test_action_security.py asserting log-injection fences, no pull_request_target, top-level permissions blocks, and write-scope allowlisting; bumps version to 1.0.0.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as dorian bench public-repos
  participant main as public_repos.main()
  participant manifest as load_manifest
  participant checkout as _clone_or_reuse
  participant run as run_repo
  participant seal as seal (frozen SHA)
  participant revalidate
  participant writer as write_results

  CLI->>main: --manifest, --out, --workdir, [--deny-exec]
  main->>manifest: load + validate RepoSpecs
  loop each eligible repo
    main->>checkout: spec.url, frozen_sha
    checkout-->>main: verified checkout path
    main->>run: spec, checkout, claims_doc, mutations, policy
    run->>run: check NO_CLAIMS / SKIPPED_DENY_EXEC
    run->>seal: frozen commit, permissive policy
    loop each mutation
      run->>run: restore clean snapshot
      run->>run: apply from→to (exactly once)
      run->>revalidate: ExecutionPolicy
      revalidate-->>run: (observed_state, selected)
    end
    run-->>main: result {truth_layer, trigger_layer, status}
    main->>writer: JSON + JSONL per repo
  end
  main-->>CLI: stdout report or JSON
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • ajaysurya1221/dorian#3: Directly modifies the same src/dorian/seal.py referenced-path extraction and watch-list derivation code that this PR updates to handle the new config-value grammar segment.

Poem

🐇 Hop hop, the config key is found,
In TOML and JSON, truth is sound.
Mutations applied, verdicts observed,
Release gates checked, no overclaim served.
The rabbit seals commits with care,
Atomic events committed fair! ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch week1-public-microbench

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (2)
tests/test_config_value.py (1)

141-202: ⚡ Quick win

Add an invalid UTF-8 config test to lock config_unparseable behavior.

This suite should also pin behavior for unreadable/invalid-encoding bytes, so malformed config files cannot be treated as normal comparisons.

Suggested test addition
+def test_invalid_utf8_json_errors(tmp_path: Path) -> None:
+    (tmp_path / "bad.json").write_bytes(b'{"a":"\xff"}')
+    v, d = _check(tmp_path, "config-value:bad.json:a:1")
+    assert v is Verdict.ERROR and "config_unparseable" in d
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_config_value.py` around lines 141 - 202, Add a new test function
named test_invalid_utf8_errors after test_nan_infinity_expected_literal_errors
that verifies config files with invalid UTF-8 encoding are handled correctly.
Write a config file (for example named "invalid.json") containing invalid UTF-8
bytes using binary write mode, then call _check with a config-value spec
pointing to that file, and assert that it returns Verdict.ERROR with
"config_unparseable" in the diagnostic message, following the same pattern as
test_invalid_toml_errors and test_invalid_json_errors.
tests/test_action_security.py (1)

74-79: ⚡ Quick win

Add a regression test that enforces SHA-pinned uses: references.

This suite already enforces core workflow security invariants; adding pin checks here will prevent the current unpinned-action class from regressing.

🧪 Suggested test addition
 def test_every_workflow_declares_permissions():
     for wf in WORKFLOWS:
         assert "permissions:" in wf.read_text(encoding="utf-8"), (
             f"{wf.name} has no top-level permissions block (defaults to broad scopes)"
         )
+
+
+def test_workflow_actions_are_sha_pinned():
+    for wf in WORKFLOWS:
+        text = wf.read_text(encoding="utf-8")
+        for m in re.finditer(r"^\s*-\s+uses:\s+([^@\s]+)@([^\s]+)\s*$", text, flags=re.M):
+            ref = m.group(2)
+            assert re.fullmatch(r"[0-9a-f]{40}", ref), (
+                f"{wf.name} uses unpinned action ref: {m.group(1)}@{ref}"
+            )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_action_security.py` around lines 74 - 79, Add a new test function
in the test_action_security.py file (alongside the existing
test_every_workflow_declares_permissions function) that iterates through all
WORKFLOWS and checks that every uses: reference in each workflow file is pinned
to a SHA hash rather than using mutable tags or branches. The test should parse
the workflow YAML content and validate that each uses: directive contains a SHA
hash reference (typically in the format owner/action@sha256:...), and assert
with a clear error message if any unpinned uses: references are found,
preventing regression of the unpinned-action security vulnerability.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/public-microbench.yml:
- Around line 49-54: The workflow file directly interpolates the inputs.repo
value into the shell command on line 53, creating a command injection
vulnerability if special characters are provided. Instead of using direct
template expansion like ${{ inputs.repo != '' && format('--repo {0}',
inputs.repo) || '' }} in the run command, pass the input value as an environment
variable through the env section of the step, then reference it safely in the
shell script with proper quoting or conditional logic that prevents shell
interpretation of the input.

In @.github/workflows/release-gate.yml:
- Line 34: Replace all mutable GitHub Actions version tags with immutable full
40-character commit SHAs across five workflow files. In
.github/workflows/release-gate.yml, pin actions/checkout at lines 34 and 53,
astral-sh/setup-uv at line 38, actions/setup-python at line 57, and
actions/upload-artifact at line 84 to their respective commit SHAs. In
.github/workflows/public-microbench.yml, pin actions/checkout at line 30,
astral-sh/setup-uv at line 31, and actions/upload-artifact at lines 37 and 58.
In .github/workflows/publish-testpypi.yml, pin actions/checkout at line 30,
actions/setup-python at line 33, actions/upload-artifact at lines 40 and 59, and
actions/download-artifact at line 54. In .github/workflows/publish.yml, pin
actions/checkout at line 29, actions/setup-python at line 32,
actions/upload-artifact at line 51, and actions/download-artifact at line 65. In
.github/workflows/ci.yml, pin actions/checkout at line 16 and astral-sh/setup-uv
at line 17. For each reference, look up the commit SHA corresponding to the
currently pinned version tag and replace the version reference (e.g., `@v6.0.3` or
`@release/v1`) with the full 40-character commit SHA (format:
`@abcdef1234567890abcdef1234567890abcdef12`).

In `@bench/public_repos.py`:
- Around line 668-671: Wrap risky calls in the dry_run block (at lines 668-671
in the _dry_run_plan function call and print/json.dumps operations) and the
execution block (at lines 678-691) in try-except handlers to normalize
repo-level runtime failures like decode errors, IO errors, and git failures into
controlled skips instead of letting exceptions abort the entire run. Catch broad
exception types that could be raised by _dry_run_plan, _render_plan, json.dumps,
and related execution calls, and handle them gracefully by logging the error and
continuing with the next item rather than propagating the exception up to
terminate the process.
- Around line 236-247: The file path in Mutation.file is not validated for path
traversal attacks before being used in filesystem operations. Add validation
logic to the field checking block (where you validate that required fields like
"id", "claim_id", "file", "from", "to" are present) to also verify that the file
path from raw["file"] does not contain path traversal sequences like ".." or
start with "/" to prevent escape from the checkout directory. Raise a
MutationError if the file path is invalid, similar to the existing None checks.

In `@bench/public/results/REPORT.md`:
- Around line 3-13: The REPORT.md documentation contains terminology
inconsistencies that need alignment with v1.0.0. In the introductory text on
line 3, change "VERIFIED" to "trusted" to match the actual terminology used in
the table below. Additionally, on line 13, update the version reference from
"1.0.0rc1" to "1.0.0" to accurately reflect that this PR represents the v1.0.0
release merge path, not a release candidate. These changes ensure the
documentation accurately represents the current release version and uses
consistent terminology throughout.

In `@bench/release_state.py`:
- Around line 699-701: The default value for the --decision-out argument is
currently set to docs/RELEASE_DECISION_1_0.md, which conflicts with the artifact
policy that specifies commit-specific decision output should be stored as
release artifacts (under .release/ or similar) rather than in the source
documentation directory. Change the default path to align with the documented
artifact storage location (e.g., .release/RELEASE_DECISION_1_0.md or another
appropriate artifact directory) to prevent overwriting source documentation.
- Around line 625-627: The workflow file scan in line 626 uses glob pattern
"*.yml" which excludes workflow files with the ".yaml" extension. This causes
unsafe or required workflows defined as ".yaml" to be invisible to S8/S9/S10
checks. Update the glob pattern in the _read call to match both "*.yml" and
"*.yaml" file extensions so that all workflow definitions are included in the
fact scan, either by using a pattern like "*.y{a,}ml" or by combining results
from two separate glob calls for "*.yml" and "*.yaml".
- Around line 294-304: The mb_dispatch_only check on line 296 is too permissive
because it only verifies that pull_request_target is absent, but does not verify
that workflow_dispatch is the ONLY trigger present. A workflow could have
workflow_dispatch AND other unsafe triggers like push or pull_request and still
pass. You need to strengthen the mb_dispatch_only logic to explicitly check that
dangerous triggers (push, pull_request, schedule, etc.) are not present in the
microbench workflow. Similarly, the ci_perms check on line 294 is too lenient as
it only looks for the presence of the "permissions:" string without validating
the actual content of the permissions block. You need to enhance the ci_perms
check to verify that the permissions are actually properly scoped and restricted
(for example, checking that the permissions are set to read-only or contain
appropriate access limitations) rather than just checking for the string's
existence.
- Around line 705-708: The `--strict` argument in the add_argument call for
strict gating is currently opt-in with action="store_true", meaning strict mode
defaults to False and allows missing gate evidence to avoid halting. Change this
so strict mode is the default by adding default=True to the argument definition,
ensuring that missing gate evidence will result in HALT_INSUFFICIENT_EVIDENCE
unless explicitly overridden by a flag to disable strict mode.

In `@README.md`:
- Line 482: Update the documentation to reflect the final v1.0.0 release instead
of the release candidate. Replace all references to `v1.0.0rc2` with `v1.0.0` in
the following locations: README.md at line 482, and docs/BENCHMARK_CURRENT.md at
lines 13, 24-25. This ensures the documentation accurately reflects the final
release version per commit 2f10534.

In `@src/dorian/checkers/c3_ref.py`:
- Around line 253-257: The read_text call with errors="replace" silently
converts malformed UTF-8 bytes into replacement characters, allowing the config
file to be parsed successfully when it should return an error. Remove the
errors="replace" parameter from the path.read_text() call to enforce strict
UTF-8 decoding, and add UnicodeDecodeError to the exception handler to catch and
properly return an error result for files with invalid UTF-8 encoding.

In `@src/dorian/store.py`:
- Around line 390-425: In both apply_claim_state_atomic and
apply_trust_state_atomic functions, add validation to check that the target row
exists before attempting the state update. Specifically, in
apply_claim_state_atomic, verify that the warrant_id and claim_id combination
exists in the database before calling _set_claim_state_sql; in
apply_trust_state_atomic, verify that the warrant_id exists before calling
_set_trust_state_sql. Place these validation checks inside the with conn
transaction block but before the state update calls. If the target does not
exist, raise an appropriate exception (such as ValueError or a custom exception)
to prevent _append_event_sql from being called and persisting an audit event for
a non-existent state transition.

In `@tests/test_bench_public_report.py`:
- Around line 94-95: The assertion at lines 94-95 in the loop over
pr.REQUIRED_PHRASES currently checks if phrase appears anywhere in doc, but
should instead check if it appears in body to ensure the required framing is in
the publishable narrative text rather than potentially only in glossary
sections. Change the assertion from checking phrase in doc to checking phrase in
body, keeping the same error message format.

---

Nitpick comments:
In `@tests/test_action_security.py`:
- Around line 74-79: Add a new test function in the test_action_security.py file
(alongside the existing test_every_workflow_declares_permissions function) that
iterates through all WORKFLOWS and checks that every uses: reference in each
workflow file is pinned to a SHA hash rather than using mutable tags or
branches. The test should parse the workflow YAML content and validate that each
uses: directive contains a SHA hash reference (typically in the format
owner/action@sha256:...), and assert with a clear error message if any unpinned
uses: references are found, preventing regression of the unpinned-action
security vulnerability.

In `@tests/test_config_value.py`:
- Around line 141-202: Add a new test function named test_invalid_utf8_errors
after test_nan_infinity_expected_literal_errors that verifies config files with
invalid UTF-8 encoding are handled correctly. Write a config file (for example
named "invalid.json") containing invalid UTF-8 bytes using binary write mode,
then call _check with a config-value spec pointing to that file, and assert that
it returns Verdict.ERROR with "config_unparseable" in the diagnostic message,
following the same pattern as test_invalid_toml_errors and
test_invalid_json_errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0ca624a8-2991-4091-82ca-1fec2dd116b2

📥 Commits

Reviewing files that changed from the base of the PR and between c886610 and 2f10534.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (50)
  • .github/workflows/ci.yml
  • .github/workflows/public-microbench.yml
  • .github/workflows/publish-testpypi.yml
  • .github/workflows/release-gate.yml
  • .gitignore
  • README.md
  • bench/public/manifest.v1.yaml
  • bench/public/reports/.gitkeep
  • bench/public/repos/bandit/claims.json
  • bench/public/repos/humanize/claims.draft.json
  • bench/public/repos/humanize/claims.generated.json
  • bench/public/repos/humanize/claims.json
  • bench/public/repos/humanize/mutations.yaml
  • bench/public/repos/humanize/rejected.json
  • bench/public/repos/humanize/targets.json
  • bench/public/repos/jaffle_shop_duckdb/claims.json
  • bench/public/repos/python-dotenv/claims.draft.json
  • bench/public/repos/python-dotenv/claims.generated.json
  • bench/public/repos/python-dotenv/claims.json
  • bench/public/repos/python-dotenv/mutations.yaml
  • bench/public/repos/python-dotenv/rejected.json
  • bench/public/repos/python-dotenv/targets.json
  • bench/public/repos/tomli/claims.json
  • bench/public/results/.gitignore
  • bench/public/results/.gitkeep
  • bench/public/results/REPORT.md
  • bench/public_claims.py
  • bench/public_repos.py
  • bench/release_state.py
  • docs/BENCHMARK_CURRENT.md
  • docs/BENCHMARK_PUBLIC_REAL_REPOS.md
  • docs/RELEASE_DECISION_1_0.md
  • docs/RELEASE_GATE_1_0.md
  • pyproject.toml
  • spec/checkers.md
  • src/dorian/__init__.py
  • src/dorian/checkers/c3_ref.py
  • src/dorian/commands.py
  • src/dorian/fold.py
  • src/dorian/seal.py
  • src/dorian/store.py
  • src/dorian/strength.py
  • tests/test_action_security.py
  • tests/test_bench_public_claim_synthesis.py
  • tests/test_bench_public_harness.py
  • tests/test_bench_public_manifest.py
  • tests/test_bench_public_report.py
  • tests/test_config_value.py
  • tests/test_fold.py
  • tests/test_release_state.py

Comment on lines +49 to +54
run: >-
uv run python -m dorian.cli bench public-repos
--manifest bench/public/manifest.v1.yaml
--out bench/public/results --workdir .bench/public-work
${{ inputs.repo != '' && format('--repo {0}', inputs.repo) || '' }}
${{ inputs.deny_exec && '--deny-exec' || '' }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid raw shell interpolation of workflow input in the harness command.

Line 53 expands inputs.repo directly into run: script text; this can become command injection if special characters are provided.

🔧 Suggested fix
       - name: Run harness (fail-closed re-check)
-        run: >-
-          uv run python -m dorian.cli bench public-repos
-          --manifest bench/public/manifest.v1.yaml
-          --out bench/public/results --workdir .bench/public-work
-          ${{ inputs.repo != '' && format('--repo {0}', inputs.repo) || '' }}
-          ${{ inputs.deny_exec && '--deny-exec' || '' }}
+        env:
+          INPUT_REPO: ${{ inputs.repo }}
+          INPUT_DENY_EXEC: ${{ inputs.deny_exec }}
+        run: |
+          args=(
+            --manifest bench/public/manifest.v1.yaml
+            --out bench/public/results
+            --workdir .bench/public-work
+          )
+          if [ -n "$INPUT_REPO" ]; then
+            args+=(--repo "$INPUT_REPO")
+          fi
+          if [ "$INPUT_DENY_EXEC" = "true" ]; then
+            args+=(--deny-exec)
+          fi
+          uv run python -m dorian.cli bench public-repos "${args[@]}"
🧰 Tools
🪛 zizmor (1.25.2)

[error] 53-53: code injection via template expansion (template-injection): may expand into attacker-controllable code

(template-injection)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/public-microbench.yml around lines 49 - 54, The workflow
file directly interpolates the inputs.repo value into the shell command on line
53, creating a command injection vulnerability if special characters are
provided. Instead of using direct template expansion like ${{ inputs.repo != ''
&& format('--repo {0}', inputs.repo) || '' }} in the run command, pass the input
value as an environment variable through the env section of the step, then
reference it safely in the shell script with proper quoting or conditional logic
that prevents shell interpretation of the input.

Source: Linters/SAST tools

matrix:
python: ["3.11", "3.12", "3.13"]
steps:
- uses: actions/checkout@v6.0.3

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Lists workflow action uses that are NOT pinned to a 40-char commit SHA.
rg -n '^\s*-\s+uses:\s+[^@\s]+@([^\s]+)\s*$' .github/workflows \
  | rg -v '@[0-9a-fA-F]{40}$'

Repository: ajaysurya1221/dorian

Length of output: 1363


Pin all uses: action references to immutable commit SHAs.
The same root cause appears across five workflows: mutable tag-based action refs (@v*, @release/v1) violate hardened supply-chain policy and can drift unexpectedly.

  • .github/workflows/release-gate.yml#L34: pin actions/checkout to a full 40-char commit SHA
  • .github/workflows/release-gate.yml#L38: pin astral-sh/setup-uv to a full 40-char commit SHA
  • .github/workflows/release-gate.yml#L53: pin actions/checkout to a full 40-char commit SHA
  • .github/workflows/release-gate.yml#L57: pin actions/setup-python to a full 40-char commit SHA
  • .github/workflows/release-gate.yml#L84: pin actions/upload-artifact to a full 40-char commit SHA
  • .github/workflows/public-microbench.yml#L30: pin actions/checkout to a full 40-char commit SHA
  • .github/workflows/public-microbench.yml#L31: pin astral-sh/setup-uv to a full 40-char commit SHA
  • .github/workflows/public-microbench.yml#L58: pin actions/upload-artifact to a full 40-char commit SHA
  • .github/workflows/publish-testpypi.yml#L30: pin actions/checkout to a full 40-char commit SHA
  • .github/workflows/publish-testpypi.yml#L33: pin actions/setup-python to a full 40-char commit SHA
  • .github/workflows/publish-testpypi.yml#L40: pin actions/upload-artifact to a full 40-char commit SHA
  • .github/workflows/publish-testpypi.yml#L54: pin actions/download-artifact to a full 40-char commit SHA
  • .github/workflows/publish.yml#L29: pin actions/checkout to a full 40-char commit SHA
  • .github/workflows/publish.yml#L32: pin actions/setup-python to a full 40-char commit SHA
  • .github/workflows/publish.yml#L51: pin actions/upload-artifact to a full 40-char commit SHA
  • .github/workflows/publish.yml#L65: pin actions/download-artifact to a full 40-char commit SHA
  • .github/workflows/ci.yml#L16: pin actions/checkout to a full 40-char commit SHA
  • .github/workflows/ci.yml#L17: pin astral-sh/setup-uv to a full 40-char commit SHA
🧰 Tools
🪛 zizmor (1.25.2)

[warning] 34-37: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)


[error] 34-34: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

📍 Affects 3 files
  • .github/workflows/release-gate.yml#L34-L34 (this comment)
  • .github/workflows/release-gate.yml#L38-L38
  • .github/workflows/release-gate.yml#L53-L53
  • .github/workflows/release-gate.yml#L57-L57
  • .github/workflows/release-gate.yml#L81-L81
  • .github/workflows/release-gate.yml#L84-L84
  • .github/workflows/public-microbench.yml#L30-L30
  • .github/workflows/public-microbench.yml#L31-L31
  • .github/workflows/public-microbench.yml#L37-L37
  • .github/workflows/public-microbench.yml#L58-L58
  • .github/workflows/publish-testpypi.yml#L30-L30
  • .github/workflows/publish-testpypi.yml#L33-L33
  • .github/workflows/publish-testpypi.yml#L40-L40
  • .github/workflows/publish-testpypi.yml#L54-L54
  • .github/workflows/publish-testpypi.yml#L59-L59
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release-gate.yml at line 34, Replace all mutable GitHub
Actions version tags with immutable full 40-character commit SHAs across five
workflow files. In .github/workflows/release-gate.yml, pin actions/checkout at
lines 34 and 53, astral-sh/setup-uv at line 38, actions/setup-python at line 57,
and actions/upload-artifact at line 84 to their respective commit SHAs. In
.github/workflows/public-microbench.yml, pin actions/checkout at line 30,
astral-sh/setup-uv at line 31, and actions/upload-artifact at lines 37 and 58.
In .github/workflows/publish-testpypi.yml, pin actions/checkout at line 30,
actions/setup-python at line 33, actions/upload-artifact at lines 40 and 59, and
actions/download-artifact at line 54. In .github/workflows/publish.yml, pin
actions/checkout at line 29, actions/setup-python at line 32,
actions/upload-artifact at line 51, and actions/download-artifact at line 65. In
.github/workflows/ci.yml, pin actions/checkout at line 16 and astral-sh/setup-uv
at line 17. For each reference, look up the commit SHA corresponding to the
currently pinned version tag and replace the version reference (e.g., `@v6.0.3` or
`@release/v1`) with the full 40-character commit SHA (format:
`@abcdef1234567890abcdef1234567890abcdef12`).

Source: Linters/SAST tools

Comment thread bench/public_repos.py
Comment on lines +236 to +247
for fld in ("id", "claim_id", "file", "from", "to"):
if raw.get(fld) is None:
raise MutationError(f"mutations[{i}]: missing '{fld}'")
out.append(
Mutation(
id=raw["id"],
claim_id=raw["claim_id"],
file=raw["file"],
op=raw.get("op", "replace"),
from_=raw["from"],
to=raw["to"],
known_truth=kt,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Block path traversal in mutation file paths before any filesystem access.

Mutation.file is trusted as-is. A crafted path (../... or absolute) can escape the checkout and be read/written by snapshot/apply/restore logic.

Suggested fix
@@
 def parse_mutations(path: Path) -> list[Mutation]:
@@
         for fld in ("id", "claim_id", "file", "from", "to"):
             if raw.get(fld) is None:
                 raise MutationError(f"mutations[{i}]: missing '{fld}'")
+        rel_file = str(raw["file"])
+        rel_path = Path(rel_file)
+        if rel_path.is_absolute() or ".." in rel_path.parts:
+            raise MutationError(
+                f"mutations[{i}]: file must be a repo-relative path without traversal: {rel_file!r}"
+            )
         out.append(
             Mutation(
                 id=raw["id"],
                 claim_id=raw["claim_id"],
-                file=raw["file"],
+                file=rel_file,
                 op=raw.get("op", "replace"),
                 from_=raw["from"],
                 to=raw["to"],
                 known_truth=kt,
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for fld in ("id", "claim_id", "file", "from", "to"):
if raw.get(fld) is None:
raise MutationError(f"mutations[{i}]: missing '{fld}'")
out.append(
Mutation(
id=raw["id"],
claim_id=raw["claim_id"],
file=raw["file"],
op=raw.get("op", "replace"),
from_=raw["from"],
to=raw["to"],
known_truth=kt,
for fld in ("id", "claim_id", "file", "from", "to"):
if raw.get(fld) is None:
raise MutationError(f"mutations[{i}]: missing '{fld}'")
rel_file = str(raw["file"])
rel_path = Path(rel_file)
if rel_path.is_absolute() or ".." in rel_path.parts:
raise MutationError(
f"mutations[{i}]: file must be a repo-relative path without traversal: {rel_file!r}"
)
out.append(
Mutation(
id=raw["id"],
claim_id=raw["claim_id"],
file=rel_file,
op=raw.get("op", "replace"),
from_=raw["from"],
to=raw["to"],
known_truth=kt,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bench/public_repos.py` around lines 236 - 247, The file path in Mutation.file
is not validated for path traversal attacks before being used in filesystem
operations. Add validation logic to the field checking block (where you validate
that required fields like "id", "claim_id", "file", "from", "to" are present) to
also verify that the file path from raw["file"] does not contain path traversal
sequences like ".." or start with "/" to prevent escape from the checkout
directory. Raise a MutationError if the file path is invalid, similar to the
existing None checks.

Comment thread bench/public_repos.py
Comment on lines +668 to +671
if args.dry_run:
plan = _dry_run_plan(specs, manifest_dir, deny_exec=args.deny_exec)
print(json.dumps(plan, sort_keys=True, indent=2) if args.json else _render_plan(plan))
return 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize repo-level runtime failures to controlled skips instead of aborting the whole run.

Planning/execution can still terminate hard on decode/IO/git failures because some risky calls are outside try and some exception types are not normalized.

Suggested fix
@@
     if args.dry_run:
-        plan = _dry_run_plan(specs, manifest_dir, deny_exec=args.deny_exec)
+        try:
+            plan = _dry_run_plan(specs, manifest_dir, deny_exec=args.deny_exec)
+        except (ManifestError, MutationError, OSError, json.JSONDecodeError) as exc:
+            print(f"dorian bench public-repos: {exc}", file=sys.stderr)
+            return 2
         print(json.dumps(plan, sort_keys=True, indent=2) if args.json else _render_plan(plan))
         return 0
@@
     for spec in specs:
         if not spec.eligible:
             print(f"skip {spec.name}: ineligible ({spec.reason})", file=sys.stderr)
             continue
-        claims_doc = _load_optional_claims(manifest_dir, spec)
-        mutations = _load_optional_mutations(manifest_dir, spec)
-        # don't clone a repo run_repo will skip (deny-exec skip mirrors the dry-run plan)
-        will_run = is_benchmark_ready(claims_doc) and not (
-            spec.needs_exec and not policy.allow_exec
-        )
         try:
+            claims_doc = _load_optional_claims(manifest_dir, spec)
+            mutations = _load_optional_mutations(manifest_dir, spec)
+            # don't clone a repo run_repo will skip (deny-exec skip mirrors the dry-run plan)
+            will_run = is_benchmark_ready(claims_doc) and not (
+                spec.needs_exec and not policy.allow_exec
+            )
             checkout = _clone_or_reuse(spec, manifest_dir, workdir) if will_run else workdir
             result = run_repo(spec, checkout, claims_doc, mutations, policy=policy, workdir=workdir)
-        except (ManifestError, MutationError) as exc:
+        except (
+            ManifestError,
+            MutationError,
+            OSError,
+            subprocess.CalledProcessError,
+            json.JSONDecodeError,
+        ) as exc:
             # fail-closed: a misconfigured manifest or an unverifiable cache yields NO result
             # for that repo (never a fabricated one); other repos still run.
             print(f"skip {spec.name}: {exc}", file=sys.stderr)
             continue

Also applies to: 678-691

🧰 Tools
🪛 ast-grep (0.43.0)

[info] 669-669: use jsonify instead of json.dumps for JSON output
Context: json.dumps(plan, sort_keys=True, indent=2)
Note: Security best practice.

(use-jsonify)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bench/public_repos.py` around lines 668 - 671, Wrap risky calls in the
dry_run block (at lines 668-671 in the _dry_run_plan function call and
print/json.dumps operations) and the execution block (at lines 678-691) in
try-except handlers to normalize repo-level runtime failures like decode errors,
IO errors, and git failures into controlled skips instead of letting exceptions
abort the entire run. Catch broad exception types that could be raised by
_dry_run_plan, _render_plan, json.dumps, and related execution calls, and handle
them gracefully by logging the error and continuing with the next item rather
than propagating the exception up to terminate the process.

Comment on lines +3 to +13
These are **candidate benchmark subjects** pinned at frozen commit SHAs. Any numbers below are **reproducible on these frozen SHAs** only; they do not transfer to other repositories and are not a real-world performance claim. The **trigger and truth layers reported separately**: trigger = was the affected claim re-checked; truth = was it BROKEN / VERIFIED / ERRORED. ERRORED (a checker that could not run) is never an alarm.

| repo | sha | license | status | trigger re-checked/skipped | truth broken/trusted/errored |
|---|---|---|---|---|---|
| humanize | `2ddb5903cdc1` | MIT | PASS | 4/0 | 3/1/0 |
| python-dotenv | `36004e0e34be` | BSD-3-Clause | PASS | 4/0 | 3/1/0 |
| tomli | `c5f44690c68c` | MIT | NO_CLAIMS | 0/0 | 0/0/0 |
| bandit | `92ae8b82fb42` | Apache-2.0 | NO_CLAIMS | 0/0 | 0/0/0 |
| jaffle_shop_duckdb | `36bde6cba69d` | Apache-2.0 | NO_CLAIMS | 0/0 | 0/0/0 |

_dorian 1.0.0rc1; binding selects WHEN a claim is re-checked, the checker decides WHETHER it is false; `--deny-exec`/`--deny-shell` are fail-closed policies, not sandboxes._

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align the published report text with v1.0.0 and current truth terminology.

Line 3 mixes VERIFIED wording while the table is trusted, and Line 13 still says 1.0.0rc1 despite this PR being the v1.0.0 release merge path. This can mislead readers about the current release artifact.

✏️ Proposed doc fix
-These are **candidate benchmark subjects** pinned at frozen commit SHAs. Any numbers below are **reproducible on these frozen SHAs** only; they do not transfer to other repositories and are not a real-world performance claim. The **trigger and truth layers reported separately**: trigger = was the affected claim re-checked; truth = was it BROKEN / VERIFIED / ERRORED. ERRORED (a checker that could not run) is never an alarm.
+These are **candidate benchmark subjects** pinned at frozen commit SHAs. Any numbers below are **reproducible on these frozen SHAs** only; they do not transfer to other repositories and are not a real-world performance claim. The **trigger and truth layers reported separately**: trigger = was the affected claim re-checked; truth = was it BROKEN / TRUSTED / ERRORED. ERRORED (a checker that could not run) is never an alarm.
@@
-_dorian 1.0.0rc1; binding selects WHEN a claim is re-checked, the checker decides WHETHER it is false; `--deny-exec`/`--deny-shell` are fail-closed policies, not sandboxes._
+_dorian 1.0.0; binding selects WHEN a claim is re-checked, the checker decides WHETHER it is false; `--deny-exec`/`--deny-shell` are fail-closed policies, not sandboxes._
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
These are **candidate benchmark subjects** pinned at frozen commit SHAs. Any numbers below are **reproducible on these frozen SHAs** only; they do not transfer to other repositories and are not a real-world performance claim. The **trigger and truth layers reported separately**: trigger = was the affected claim re-checked; truth = was it BROKEN / VERIFIED / ERRORED. ERRORED (a checker that could not run) is never an alarm.
| repo | sha | license | status | trigger re-checked/skipped | truth broken/trusted/errored |
|---|---|---|---|---|---|
| humanize | `2ddb5903cdc1` | MIT | PASS | 4/0 | 3/1/0 |
| python-dotenv | `36004e0e34be` | BSD-3-Clause | PASS | 4/0 | 3/1/0 |
| tomli | `c5f44690c68c` | MIT | NO_CLAIMS | 0/0 | 0/0/0 |
| bandit | `92ae8b82fb42` | Apache-2.0 | NO_CLAIMS | 0/0 | 0/0/0 |
| jaffle_shop_duckdb | `36bde6cba69d` | Apache-2.0 | NO_CLAIMS | 0/0 | 0/0/0 |
_dorian 1.0.0rc1; binding selects WHEN a claim is re-checked, the checker decides WHETHER it is false; `--deny-exec`/`--deny-shell` are fail-closed policies, not sandboxes._
These are **candidate benchmark subjects** pinned at frozen commit SHAs. Any numbers below are **reproducible on these frozen SHAs** only; they do not transfer to other repositories and are not a real-world performance claim. The **trigger and truth layers reported separately**: trigger = was the affected claim re-checked; truth = was it BROKEN / TRUSTED / ERRORED. ERRORED (a checker that could not run) is never an alarm.
| repo | sha | license | status | trigger re-checked/skipped | truth broken/trusted/errored |
|---|---|---|---|---|---|
| humanize | `2ddb5903cdc1` | MIT | PASS | 4/0 | 3/1/0 |
| python-dotenv | `36004e0e34be` | BSD-3-Clause | PASS | 4/0 | 3/1/0 |
| tomli | `c5f44690c68c` | MIT | NO_CLAIMS | 0/0 | 0/0/0 |
| bandit | `92ae8b82fb42` | Apache-2.0 | NO_CLAIMS | 0/0 | 0/0/0 |
| jaffle_shop_duckdb | `36bde6cba69d` | Apache-2.0 | NO_CLAIMS | 0/0 | 0/0/0 |
_dorian 1.0.0; binding selects WHEN a claim is re-checked, the checker decides WHETHER it is false; `--deny-exec`/`--deny-shell` are fail-closed policies, not sandboxes._
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bench/public/results/REPORT.md` around lines 3 - 13, The REPORT.md
documentation contains terminology inconsistencies that need alignment with
v1.0.0. In the introductory text on line 3, change "VERIFIED" to "trusted" to
match the actual terminology used in the table below. Additionally, on line 13,
update the version reference from "1.0.0rc1" to "1.0.0" to accurately reflect
that this PR represents the v1.0.0 release merge path, not a release candidate.
These changes ensure the documentation accurately represents the current release
version and uses consistent terminology throughout.

Comment thread bench/release_state.py
Comment on lines +705 to +708
"--strict",
action="store_true",
help="missing gate evidence is HALT_INSUFFICIENT_EVIDENCE, not SKIP",
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail-closed strict mode should be the CLI default.

Line 705 makes strict gating opt-in, which allows missing gate evidence to avoid a halt path. For a release gate, defaulting to strict=False weakens decision reliability.

Suggested fix
-    ap.add_argument(
-        "--strict",
-        action="store_true",
-        help="missing gate evidence is HALT_INSUFFICIENT_EVIDENCE, not SKIP",
-    )
+    ap.add_argument(
+        "--strict",
+        action=argparse.BooleanOptionalAction,
+        default=True,
+        help="missing gate evidence is HALT_INSUFFICIENT_EVIDENCE, not SKIP",
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"--strict",
action="store_true",
help="missing gate evidence is HALT_INSUFFICIENT_EVIDENCE, not SKIP",
)
ap.add_argument(
"--strict",
action=argparse.BooleanOptionalAction,
default=True,
help="missing gate evidence is HALT_INSUFFICIENT_EVIDENCE, not SKIP",
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bench/release_state.py` around lines 705 - 708, The `--strict` argument in
the add_argument call for strict gating is currently opt-in with
action="store_true", meaning strict mode defaults to False and allows missing
gate evidence to avoid halting. Change this so strict mode is the default by
adding default=True to the argument definition, ensuring that missing gate
evidence will result in HALT_INSUFFICIENT_EVIDENCE unless explicitly overridden
by a flag to disable strict mode.

Comment thread README.md
([`docs/BENCHMARK_PUBLIC_REAL_REPOS.md`](docs/BENCHMARK_PUBLIC_REAL_REPOS.md)). These are
**reproducible on those frozen SHAs only** — not a real-world performance claim; the trigger and
truth layers are reported separately.
- **PyPI trusted publishing** — tagged releases now ship (latest: **`v1.0.0rc2`**, a V1 release

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check git tags related to the version
git tag | grep -E '(1\.0\.0|v1\.0\.0)' | sort -V

# Check if the specific commit mentioned exists
git log --oneline | grep -i '2f10534' | head -5

Repository: ajaysurya1221/dorian

Length of output: 88


🏁 Script executed:

# Read README.md around line 482
sed -n '475,490p' README.md

# Read docs/BENCHMARK_CURRENT.md to check lines 13 and 24-26
cat -n docs/BENCHMARK_CURRENT.md | sed -n '10,30p'

Repository: ajaysurya1221/dorian

Length of output: 2383


Update documentation to reflect final v1.0.0 release. The final release is v1.0.0 (per commit 2f10534), but README.md line 482 and docs/BENCHMARK_CURRENT.md (lines 13, 24–25) incorrectly reference v1.0.0rc2 as the current version. Update these files to reference v1.0.0 instead of v1.0.0rc2.

📍 Affects 2 files
  • README.md#L482-L482 (this comment)
  • docs/BENCHMARK_CURRENT.md#L13-L13
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 482, Update the documentation to reflect the final v1.0.0
release instead of the release candidate. Replace all references to `v1.0.0rc2`
with `v1.0.0` in the following locations: README.md at line 482, and
docs/BENCHMARK_CURRENT.md at lines 13, 24-25. This ensures the documentation
accurately reflects the final release version per commit 2f10534.

Comment on lines +253 to +257
raw = path.read_text(encoding="utf-8", errors="replace")
try:
data = tomllib.loads(raw) if path.suffix.lower() == ".toml" else json.loads(raw)
except (tomllib.TOMLDecodeError, json.JSONDecodeError, ValueError) as exc:
return CheckResult(Verdict.ERROR, detail=f"config_unparseable: {type(exc).__name__}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use strict UTF-8 decode for config files (don’t silently replace bytes).

Decoding with errors="replace" can turn malformed config bytes into parseable text, producing PASS/FAIL where this checker should return ERROR(config_unparseable).

Suggested patch
-    raw = path.read_text(encoding="utf-8", errors="replace")
-    try:
-        data = tomllib.loads(raw) if path.suffix.lower() == ".toml" else json.loads(raw)
-    except (tomllib.TOMLDecodeError, json.JSONDecodeError, ValueError) as exc:
+    try:
+        raw = path.read_bytes().decode("utf-8")
+        data = tomllib.loads(raw) if path.suffix.lower() == ".toml" else json.loads(raw)
+    except (
+        OSError,
+        UnicodeDecodeError,
+        tomllib.TOMLDecodeError,
+        json.JSONDecodeError,
+        ValueError,
+    ) as exc:
         return CheckResult(Verdict.ERROR, detail=f"config_unparseable: {type(exc).__name__}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raw = path.read_text(encoding="utf-8", errors="replace")
try:
data = tomllib.loads(raw) if path.suffix.lower() == ".toml" else json.loads(raw)
except (tomllib.TOMLDecodeError, json.JSONDecodeError, ValueError) as exc:
return CheckResult(Verdict.ERROR, detail=f"config_unparseable: {type(exc).__name__}")
try:
raw = path.read_bytes().decode("utf-8")
data = tomllib.loads(raw) if path.suffix.lower() == ".toml" else json.loads(raw)
except (
OSError,
UnicodeDecodeError,
tomllib.TOMLDecodeError,
json.JSONDecodeError,
ValueError,
) as exc:
return CheckResult(Verdict.ERROR, detail=f"config_unparseable: {type(exc).__name__}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/dorian/checkers/c3_ref.py` around lines 253 - 257, The read_text call
with errors="replace" silently converts malformed UTF-8 bytes into replacement
characters, allowing the config file to be parsed successfully when it should
return an error. Remove the errors="replace" parameter from the path.read_text()
call to enforce strict UTF-8 decoding, and add UnicodeDecodeError to the
exception handler to catch and properly return an error result for files with
invalid UTF-8 encoding.

Comment thread src/dorian/store.py
Comment on lines +390 to +425
def apply_claim_state_atomic(
conn: sqlite3.Connection,
warrant_id: str,
claim_id: str,
state: str,
at_iso: str,
*,
actor: str,
kind: str,
cause: dict | None,
) -> None:
"""Persist a claim-state change AND its audit event in ONE transaction: both land or
neither does. An exception (e.g. a crash before/while writing the event) rolls back the
state UPDATE, so the store never holds a state change without its audit event. `with
conn` commits on success and rolls back on any exception."""
with conn:
_set_claim_state_sql(conn, warrant_id, claim_id, state, at_iso)
_append_event_sql(
conn, warrant_id=warrant_id, claim_id=claim_id, actor=actor, kind=kind, cause=cause
)


def apply_trust_state_atomic(
conn: sqlite3.Connection,
warrant_id: str,
state: str,
*,
actor: str,
kind: str,
cause: dict | None,
) -> None:
"""Persist a trust-state change AND its fold audit event in ONE transaction (both or
neither), so a crash never leaves a trust change without its event."""
with conn:
_set_trust_state_sql(conn, warrant_id, state)
_append_event_sql(conn, warrant_id=warrant_id, actor=actor, kind=kind, cause=cause)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate target existence inside the atomic transaction before appending the event.

Line [406] and Line [424] can be no-op updates when the target row is absent, but Line [407] and Line [425] still append an audit event. That can persist an event without a corresponding state transition. Please enforce existence in-transaction and raise before event insertion.

Proposed fix
 def apply_claim_state_atomic(
@@
 ) -> None:
@@
     with conn:
+        row = conn.execute(
+            "SELECT 1 FROM claim WHERE warrant_id = ? AND id = ?",
+            (warrant_id, claim_id),
+        ).fetchone()
+        if row is None:
+            raise KeyError(f"{warrant_id}:{claim_id}")
         _set_claim_state_sql(conn, warrant_id, claim_id, state, at_iso)
         _append_event_sql(
             conn, warrant_id=warrant_id, claim_id=claim_id, actor=actor, kind=kind, cause=cause
         )
@@
 def apply_trust_state_atomic(
@@
 ) -> None:
@@
     with conn:
+        row = conn.execute("SELECT 1 FROM warrant WHERE id = ?", (warrant_id,)).fetchone()
+        if row is None:
+            raise KeyError(warrant_id)
         _set_trust_state_sql(conn, warrant_id, state)
         _append_event_sql(conn, warrant_id=warrant_id, actor=actor, kind=kind, cause=cause)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/dorian/store.py` around lines 390 - 425, In both apply_claim_state_atomic
and apply_trust_state_atomic functions, add validation to check that the target
row exists before attempting the state update. Specifically, in
apply_claim_state_atomic, verify that the warrant_id and claim_id combination
exists in the database before calling _set_claim_state_sql; in
apply_trust_state_atomic, verify that the warrant_id exists before calling
_set_trust_state_sql. Place these validation checks inside the with conn
transaction block but before the state update calls. If the target does not
exist, raise an appropriate exception (such as ValueError or a custom exception)
to prevent _append_event_sql from being called and persisting an audit event for
a non-existent state transition.

Comment on lines +94 to +95
for phrase in pr.REQUIRED_PHRASES:
assert phrase in doc, f"benchmark doc missing required framing: {phrase!r}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate required phrasing against body, not full doc.

At Lines 94-95, the assertion checks phrase in doc, so the test can pass if wording appears only in the “Allowed vs forbidden” glossary. That misses the stated intent of enforcing framing in the publishable narrative body.

Suggested fix
-    for phrase in pr.REQUIRED_PHRASES:
-        assert phrase in doc, f"benchmark doc missing required framing: {phrase!r}"
+    for phrase in pr.REQUIRED_PHRASES:
+        assert phrase in body, f"benchmark doc body missing required framing: {phrase!r}"

Downstream impact: once this is fixed, update docs/BENCHMARK_PUBLIC_REAL_REPOS.md body text to contain the exact required phrase reproducible on these frozen SHAs outside the glossary section.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_bench_public_report.py` around lines 94 - 95, The assertion at
lines 94-95 in the loop over pr.REQUIRED_PHRASES currently checks if phrase
appears anywhere in doc, but should instead check if it appears in body to
ensure the required framing is in the publishable narrative text rather than
potentially only in glossary sections. Change the assertion from checking phrase
in doc to checking phrase in body, keeping the same error message format.

@ajaysurya1221 ajaysurya1221 merged commit 86ef812 into main Jun 16, 2026
8 checks passed
@ajaysurya1221 ajaysurya1221 deleted the week1-public-microbench branch June 16, 2026 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants