-
Goal: Portable, Git-only AI reviewer that runs in CI on PR creation and on-demand, producing a separate review snapshot branch with:
- a human-readable review file, and
- small, cherry-pickable commits with
[skip ci]to avoid loops.
-
Targets: Bitbucket Pipelines & Azure DevOps Pipelines via thin wrappers around a shared C# CLI (one-shot orchestration).
-
LLM tools (MVP order): OpenAI Codex CLI first; Gemini CLI & Copilot CLI supported next via the same
--exechook.
- No platform-native PR comments/status checks.
- No remote pushes from the LLM (wrappers push only).
- No scrubbing/filtering of diffs sent to the LLM (full diff by design).
- No support for fork PRs (skipped with a clear log).
- No submodule editing (ignored).
-
Trigger
- Auto on PR open; later runs on demand via manual pipeline run.
-
C# CLI (one-shot) —
ai-review run- Detect PR context (env → git fallback).
git fetch --prune; computemerge-base.- Create
./.ai-review/metadata.json+diff.patch. - Determine timestamped SEQ
tYYYYMMDD-HHMMSS. - Create & checkout
ai-review/pr-<PR>-t<TS>. - Write
./.ai-review/INSTRUCTIONS.md(approved minimal+guardrails). - Launch LLM CLI via
--exec "codex …"(90-min timeout), with no remote. - On exit, list commits in
START..HEAD; append SHAs & subjects into the review file. - Print single-line JSON to stdout:
{"status":"ok","reviewBranch":"ai-review/pr-123-t20251002-143312", "prNumber":123, "timestamp":"20251002-143312", "commits":7, "reviewFile":".ai-reviews/review-pr-123-t20251002-143312.md", "durationSec":...}
-
Wrapper (Bitbucket/ADO)
- Parse JSON →
REVIEW_BRANCH. - Run validation loop (auto-detect lint + tests) up to 3 total LLM runs (no global time cap; 90-min per run). Between runs, write
./.ai-review/feedback.jsonand rerunai-review run --exec ...if not green. - Push
REVIEW_BRANCHusing SSH or HTTPS token (auto-select). - Upload
./.ai-review/as CI artifact. - Exit policy: lenient — job fails only on fatal infra errors; otherwise succeed and note failures at top of the review file.
- Parse JSON →
-
Snapshot branch:
ai-review/pr-<PR>-t<YYYYMMDD-HHMMSS>(no auto-deletion). -
Review file (in repo):
.ai-reviews/review-pr-<PR>-t<YYYYMMDD-HHMMSS>.md(immutable per run). -
Bundle (CI artifacts):
./.ai-review/-
metadata.json:{ "prNumber": 123, "sourceBranch": "feature/xyz", "targetBranch": "main", "baseSha": "...", "mergeBaseSha": "...", "headSha": "...", "createdAt": "2025-10-09T13:20:00Z", "runTimestamp": "20251009-132000", "reviewBranch": "ai-review/pr-123-t20251009-132000", "repo": {"remoteUrl":"...", "defaultBranch":"main"}, "ci": {"provider":"bitbucket|azure-devops","runUrl":"..."}, "llmGuidance": { "commitStyle":"small, atomic, cherry-pickable", "includeReviewMd": true, "limits": {"maxFilesHint":10,"maxLocHint":400,"maxHunkLocHint":120} }, "commands": { "diff":"git diff --find-renames --find-copies --binary --unified=5 <merge-base>...<head>" } } -
diff.patch(full unified diff). -
INSTRUCTIONS.md(approved text). -
Optional across retries:
feedback.json(wrapper writes).
-
Header (CLI):
- PR
#<id>, Timestamp (UTC), Tool (name/version), AI commits this run:N, base/head short SHAs
Sections:
- Summary (3–6 bullets) — LLM
- Applied fixes — CLI lists commit subject + short SHA; LLM rationale under each (1–3 lines PR-review tone)
- Suggestions (manual) — LLM bullets
- Risks & follow-ups — LLM short list
- Subject:
fix: <summary> [skip ci](≤72 chars). - Body: up to 10 lines, sounds like a PR comment (what/why/risks).
- Prefer PR-touched files; broaden only if clearly necessary.
- Deny touching:
.git/**, CI folders,node_modules/**,dist/**,vendor/**,**/*.lock,**/*.min.*, common binaries.
- Pre-filter the diff to code/text types for the purpose of size caps; then cap at ~5,000 lines or 200 files.
- If still over cap → write a short notice & generate only high-level notes; avoid code edits.
- Skip entirely (no run); print a clear log line explaining skip due to secrets/push restrictions.
image: ghcr.io/<org>/ai-review:v0.1.0@sha256:<digest>
pipelines:
pull-requests:
'**':
- step:
name: AI Review
script:
- RUN_OUT=$(ai-review run --exec "codex --non-interactive --prompt-file ./.ai-review/INSTRUCTIONS.md" --timeout 90m --out ./.ai-review)
- echo "$RUN_OUT"
- REVIEW_BRANCH=$(echo "$RUN_OUT" | jq -r '.reviewBranch')
# Retry loop (up to 3 total runs incl. first): auto-detect lint/tests; create feedback.json; re-run `ai-review run` if not green.
# (Implementation lives in a shared bash function sourced here.)
- git push origin "$REVIEW_BRANCH"
# Pipeline config should exclude ai-review/** branches from triggerspool: { vmImage: 'ubuntu-latest' }
container: ghcr.io/<org>/ai-review:v0.1.0@sha256:<digest>
pr: { branches: { include: ['*'] } }
steps:
- checkout: self
persistCredentials: true
- bash: |
RUN_OUT=$(ai-review run --exec "codex --non-interactive --prompt-file ./.ai-review/INSTRUCTIONS.md" --timeout 90m --out ./.ai-review)
echo "$RUN_OUT"
REVIEW_BRANCH=$(echo "$RUN_OUT" | jq -r '.reviewBranch')
# Retry loop as above...
git push origin "$REVIEW_BRANCH"
displayName: AI Review
# Add trigger exclusions for ai-review/** and rely on [skip ci] in subjectsPush creds: wrappers auto-select SSH (deploy key/agent) or HTTPS token (ADO OAuth System.AccessToken, Bitbucket App Password).
Loop avoidance: [skip ci] and exclude ai-review/** from triggers.
- Base: Ubuntu 24.04.
- Includes: Git, Git-LFS, .NET 8 SDK, jq, ai-review CLI, Codex CLI, Node 20 LTS (+ pnpm, yarn, eslint, prettier, jest), Python 3.11 (+ ruff, black, pytest), Java 21 (Temurin, maven, gradle), Go 1.22 (+ golangci-lint), Rust stable (+ rustfmt, clippy).
- Non-root user; pre-populated
known_hostsforbitbucket.organdssh.dev.azure.com.
-
File:
.ai-review/config.json(extended + lightweight policies){ "llm": { "tool":"codex", "exec":"codex --non-interactive --prompt-file ./.ai-review/INSTRUCTIONS.md" }, "timeouts": { "perRunMinutes": 90 }, "naming": { "branchPrefix":"ai-review", "reviewDir":".ai-reviews", "timestampFormat":"YYYYMMDD-HHMMSS" }, "policies": { "deny": ["**/*.png","**/*.jpg","**/*.exe","**/*.dll","**/*.so","**/*.pdf"], "caps": { "maxFiles":10, "maxLoc":400, "maxHunkLoc":120 } }, "workspaces": { "mode":"auto-or-root" }, "validation": { "autoDetectLintAndTests": true, "lintCommand": "", "testCommand": "" } } -
Precedence: CLI flags ▶ repo config ▶ env defaults.
-
Secrets (API keys) never in config — only via CI env.
- stdout: single-line JSON only (for wrapper parsing).
- stderr: concise status lines (start/end, tool, branch, duration).
- Commit a small run summary JSON on the review branch and upload the full bundle as CI artifacts.
0OK10bundle failed (git/diff)20LLM timeout (90 min exceeded)21LLM exited non-zero30snapshot failed (branch/checkout)40invalid PR context (env + git fallback failed)
Wrappers fail the job only on 10/20/21/30/40 (lenient policy). Otherwise succeed.
- LLM has no remote: wrapper renames/removes
origin; environment explains this to avoid confusion. - Push creds exist only in wrapper step.
- Fork PRs: skipped entirely (logs explain).
- Submodules: ignored (no init/update).
- Full diff is provided to LLM (no scrubbing) — by design because the LLM runs in the same workspace and can read the repo. Teams concerned about data egress should run the pipeline in a private network and/or adopt a local/self-hosted LLM later.
- License: BUSL-1.1.
- Registry: GHCR (public):
ghcr.io/<org>/ai-review. - Versioning: SemVer tags for CLI & image (
v0.1.0) + digests; wrappers pin by digest.
- PR context resolution (env → git fallback).
- Diff generation flags (renames/binary/unified).
- Timestamp naming & branch creation.
- JSON stdout correctness.
- Exit codes for each failure path.
- Happy path on a synthetic repo with a small PR; simulate LLM by a stub script that commits twice.
- Large PR capping behavior: generate a diff over threshold → ensure “high-level only” note.
- Retry loop: stub LLM that fails tests on first run, passes on second (wrapper validates & re-invokes).
- Monorepo auto-detect: TS workspace + .NET project simultaneously.
- Bitbucket & ADO sample pipeline runs on a test repo (internal), pushing to a sandbox remote with read/write creds.
Rules for the process (baked into docs):
- Do exactly one prioritized task per iteration; before/after, run build/lint/tests.
- When adding/updating tests, include a brief “why this test matters” comment.
- Before adding functionality, search the codebase to avoid re-implementing.
- After each iteration, append a concise update to
docs/implementation-progress.md. - Use CI-friendly, non-interactive commands.
Here’s the detailed, bottom-up Iterations Dev Plan. Each iteration is exactly one prioritized task. Every iteration includes: scope, changes, checks, tests (Given/When/Then + “why this test matters”), and deliverables.
- Do one change per iteration. Before/after: build, lint, tests (CI-friendly, non-interactive).
- When adding/updating tests, include a brief “why this test matters” comment.
- Search first (e.g.,
rg/git grep) to avoid re-implementing what already exists. - After each iteration, append a concise note to
docs/implementation-progress.md. - Prefer non-interactive tooling and deterministic flags so CI can run hands-off.
Goal: Create the monorepo structure, license, and CI placeholders.
-
Changes
-
Create folders:
/cli/ # C# .NET 8 CLI (ai-review) /docker/ # Dockerfile + build scripts /wrappers/bitbucket/ /wrappers/azure-devops/ /docs/ -
Add
LICENSE(BUSL-1.1),README.md,CODEOWNERS,.editorconfig,.gitignore. -
Add
docs/implementation-progress.md.
-
-
Checks:
dotnet --info,git --version(in container later). -
Tests: none (scaffold).
-
Deliverables: Initial commit with structure & docs.
Goal: Wire a minimal .NET 8 console app with command surface.
-
Changes
/cli/AiReview.Cli/.NET 8 exe.- Commands:
runwith flags--pr,--out,--timeout,--exec. - Exit codes reserved:
0, 10, 20, 21, 30, 40.
-
Checks:
dotnet build -c Release. -
Tests
- G/W/T: Given missing required flags, When running
ai-review run, Then exit code ≠ 0 and shows usage. why: validates CLI argument parsing and prevents silent no-ops.
- G/W/T: Given missing required flags, When running
-
Deliverables: Built binary + basic usage doc.
Goal: Normalize PR context for Bitbucket/ADO or fallback to git.
-
Changes
- Read well-known envs; map to canonical:
PR_ID, SOURCE_BRANCH, TARGET_BRANCH, HEAD_SHA. - Fallback:
git fetch --prune; resolveHEAD_SHA,BASE_TIP,MERGE_BASEviagit merge-base.
- Read well-known envs; map to canonical:
-
Checks:
git fetch --prune --quiet. -
Tests
- Env path: Given PR env vars set, When
run, Then metadata in memory matches env. why: ensures portability across providers. - Fallback: Given no PR envs, When repo has branches, Then
merge-baseis computed and not empty. why: resilience outside CI.
- Env path: Given PR env vars set, When
-
Deliverables: Internal PR context service.
Goal: Emit the review bundle to ./.ai-review/.
-
Changes
- Create folder; write
metadata.jsonwith context + timestamps. - Write
diff.patchusing:git diff --find-renames --find-copies --binary --unified=5 <merge-base>...<head>
- Create folder; write
-
Checks: File existence, UTF-8, non-colored diff.
-
Tests
- Given a simple repo change, When
run, Then.ai-review/diff.patchcontains expected file/hunk headers. why: guarantees deterministic input for LLM.
- Given a simple repo change, When
-
Deliverables: Bundle writer component.
Goal: Generate the approved INSTRUCTIONS.md.
-
Changes
- Write the static text with injected branch name, PR #, timestamp, review path.
-
Checks: File exists; placeholders resolved.
-
Tests
- Given config and PR info, Then instructions contain resolved
<REVIEW_BRANCH>and review file path. why: prevents LLM confusion / miswrites.
- Given config and PR info, Then instructions contain resolved
-
Deliverables: Instructions generator.
Goal: Create and check out ai-review/pr-<PR>-t<YYYYMMDD-HHMMSS>.
-
Changes
- Timestamped SEQ (UTC
YYYYMMDD-HHMMSS), create branch from PRHEAD_SHA,git checkout -b.
- Timestamped SEQ (UTC
-
Checks:
git rev-parse --abbrev-ref HEADequals snapshot name. -
Tests
- Given HEAD at PR tip, When creating snapshot, Then branch points to same commit. why: ensures cherry-pickable history.
-
Deliverables: Branch naming & checkout service.
Goal: Execute --exec "<LLM CLI ...>", enforce timeout, capture exit.
-
Changes
- Spawn process; stream stderr to CLI stderr; enforce 90m timeout (kill on exceed).
- Pre-run: set local git identity
AI Reviewer Bot <ai-review@local>.
-
Checks: Process exit code; timeout path returns
20. -
Tests
- Success script exits 0 → CLI exit 0. why: happy path.
- Failure script exits non-zero → CLI exit
21. why: error mapping. - Sleep script > 90m simulated → CLI exit
20. why: watchdog works.
-
Deliverables: Orchestration layer.
Goal: Append commit subjects + short SHAs to the review file.
-
Changes
- Record
START=git rev-parse HEADbefore LLM. - After exit: list
git log --format="%H%x09%s" --no-merges --reverse START..HEAD. - Open/create
.ai-reviews/review-pr-<PR>-t<TS>.md; write Applied fixes section entries.
- Record
-
Checks:
merge-base --is-ancestor START HEADsanity; else warn. -
Tests
- Given two commits by stub, When run ends, Then report lists both subjects with 7-char SHAs. why: traceability for cherry-pick.
-
Deliverables: Report enricher.
Goal: Emit one-line JSON on stdout, logs on stderr.
-
Changes
- Print
{"status":"ok","reviewBranch":"...", "prNumber":..., "timestamp":"...", "commits":N, "reviewFile":"...", "durationSec":...}.
- Print
-
Checks: stdout has exactly one JSON line; no extra.
-
Tests
- Given a successful run, Then stdout parses via
jqand contains branch + review file. why: wrappers can parse deterministically.
- Given a successful run, Then stdout parses via
-
Deliverables: Output contract finalized.
Goal: Build GHCR image with all runtimes & tools.
-
Changes
docker/Dockerfile: Ubuntu 24.04; Git + Git-LFS; .NET 8; Node 20 (+ pnpm, yarn, eslint, prettier, jest); Python 3.11 (+ ruff, black, pytest); Java 21 (+ maven, gradle); Go 1.22 (+ golangci-lint); Rust stable (+ rustfmt, clippy);jq; Codex CLI; ai-review preinstalled.- Non-root user;
known_hostsfor Bitbucket & Azure DevOps.
-
Checks:
docker build;docker runsanity (ai-review --help). -
Tests
- In container: tools report versions;
ai-reviewruns. why: reproducible CI runtime.
- In container: tools report versions;
-
Deliverables: Published image to GHCR (local tag for now).
Goal: YAML step that runs CLI and pushes the branch.
-
Changes
/wrappers/bitbucket/bitbucket-pipelines.ymlminimal example.- Auto-select push method (SSH key or app password).
- Exclude
ai-review/**from triggers.
-
Checks: Dry-run in test repo; confirm
git push origin <branch>. -
Tests
- Given sandbox repo, Then branch appears on remote with expected name. why: end-to-end viability on Bitbucket.
-
Deliverables: Working wrapper snippet + README.
Goal: YAML step for ADO that runs CLI and pushes with OAuth.
-
Changes
/wrappers/azure-devops/azure-pipelines.ymlminimal example.checkout: selfwithpersistCredentials: true.- Exclude
ai-review/**triggers.
-
Checks: Dry-run; confirm push works.
-
Tests
- Given sandbox project, Then branch is pushed and visible. why: parity across providers.
-
Deliverables: Wrapper + README.
Goal: Wrapper logic: up to 3 total runs until green; else push anyway.
-
Changes
- Auto-detect commands per workspace (JS/TS, .NET, Python, Go, Rust, Java).
- On failure: write
./.ai-review/feedback.json(top errors + counts), invoke a secondai-review run --exec ..., up to 3 total.
-
Checks: Wrapper exit policy lenient (only infra errors fail).
-
Tests
- Stub LLM: first run leaves failing tests, second fixes → wrapper does 2 runs and pushes. why: validates feedback-driven loop without flake.
-
Deliverables: Shared bash functions + doc.
Goal: Bound LLM input size; fall back to summary-only when too large.
-
Changes
- Count diff lines/files on code/text types; if > thresholds (e.g., 5k lines or 200 files), write a header notice and skip code edits.
-
Checks: Thresholds configurable via config later (keep defaults now).
-
Tests
- Synthetic large diff → report contains “too large” note, no edits attempted. why: avoids runaway cost/time.
-
Deliverables: Size gate in bundle stage.
Goal: Detect touched roots for validation run (auto-or-root).
-
Changes
- Parse
diff.patchto collect paths; detect nearest workspace/project markers; dedupe roots.
- Parse
-
Checks: Log detected roots to stderr (concise).
-
Tests
- PR touches
app-a/(JS) andsvc-b/(.NET) → both validation commands run once. why: targeted validation, faster CI.
- PR touches
-
Deliverables: Workspace detector module.
Goal: Ensure commits have [skip ci] and pipeline excludes review branches.
-
Changes
- CLI: validate each new commit subject contains
[skip ci]; if not, append note to review file and stderr warning (do not rewrite). - Wrappers: confirm trigger exclusion patterns in YAML.
- CLI: validate each new commit subject contains
-
Checks: Create a dummy commit without tag → warning appears.
-
Tests
- Given a commit subject missing
[skip ci], Then a warning block is added at top of review file. why: protects against loops without mutating commits.
- Given a commit subject missing
-
Deliverables: Guardrail + docs.
Goal: Skip forked PRs (no secrets/push); print a clear reason.
-
Changes
- Detect fork context (provider env or repo owner mismatch); exit early with status OK and note to stderr.
-
Checks: Wrapper respects skip and exits 0 (lenient).
-
Tests
- Simulated fork env → no work done, clear log line. why: secure default behavior for external PRs.
-
Deliverables: Skip policy implemented.
Goal: Don’t init/update submodules; surface pointer change notes.
-
Changes
- If
.gitmodulesor submodule SHA changed, add a short “not reviewed” note to the review file.
- If
-
Checks: Ensure no
git submodulecalls. -
Tests
- PR that bumps submodule pointer → note is present. why: clarity without complexity.
-
Deliverables: Clear behavior for submodules.
Goal: Persist results for observability.
-
Changes
- Commit
.ai-reviews/run-pr-<PR>-t<TS>.jsonwith counts/timings/tool. - Upload entire
./.ai-review/as CI artifact (wrappers do upload).
- Commit
-
Checks: File present on branch; artifact visible in CI UI.
-
Tests
- Parse run JSON fields (reviewBranch, durationSec) exist and are sane. why: enables debugging without logs.
-
Deliverables: Summary writer + wrapper artifact step.
Goal: Build & publish image with SemVer tags and digest output.
-
Changes
- GitHub Actions (or local script) to build/push
ghcr.io/<org>/ai-review:v0.1.0withv0.1,v0,latest. - Emit immutable digest; update wrapper YAML to show pin by digest (comment retains human tag).
- GitHub Actions (or local script) to build/push
-
Checks:
docker pullby digest succeeds. -
Tests
- Dry-run pipeline references the digest and starts. why: reproducibility across teams.
-
Deliverables: Release doc + version bump checklist.
Goal: Final polish for developers/operators.
-
Changes
README: quickstart, CI snippets (Bitbucket/ADO), configuration, FAQ./wrappers/*/README: provider-specific notes (tokens/SSH).- Troubleshooting: timeouts, large PR, fork skip, missing
[skip ci].
-
Checks: Links & commands validated.
-
Tests
- N/A (docs), but run a smoke E2E once on each provider.
-
Deliverables: Dev-ready docs;
docs/implementation-progress.mdupdated.
- Add adapters for Gemini CLI and Copilot CLI to
--execpresets. - Configurable policies in
.ai-review/config.json(caps, deny globs, timestamp format). - Platform adapters (Bitbucket/ADO) for native comments/status checks (still optional).
- Local-first / self-hosted LLM mode.