AI Pull Request System — Developer Specification (MVP)

1) Scope & goals

Goal: Portable, Git-only AI reviewer that runs in CI on PR creation and on-demand, producing a separate review snapshot branch with:
- a human-readable review file, and
- small, cherry-pickable commits with [skip ci] to avoid loops.
Targets: Bitbucket Pipelines & Azure DevOps Pipelines via thin wrappers around a shared C# CLI (one-shot orchestration).
LLM tools (MVP order): OpenAI Codex CLI first; Gemini CLI & Copilot CLI supported next via the same --exec hook.

Non-goals (MVP)

No platform-native PR comments/status checks.
No remote pushes from the LLM (wrappers push only).
No scrubbing/filtering of diffs sent to the LLM (full diff by design).
No support for fork PRs (skipped with a clear log).
No submodule editing (ignored).

2) High-level flow

Trigger
- Auto on PR open; later runs on demand via manual pipeline run.
C# CLI (one-shot) — ai-review run
- Detect PR context (env → git fallback).
- git fetch --prune; compute merge-base.
- Create ./.ai-review/metadata.json + diff.patch.
- Determine timestamped SEQ tYYYYMMDD-HHMMSS.
- Create & checkout ai-review/pr-<PR>-t<TS>.
- Write ./.ai-review/INSTRUCTIONS.md (approved minimal+guardrails).
- Launch LLM CLI via --exec "codex …" (90-min timeout), with no remote.
- On exit, list commits in START..HEAD; append SHAs & subjects into the review file.
- Print single-line JSON to stdout: {"status":"ok","reviewBranch":"ai-review/pr-123-t20251002-143312", "prNumber":123, "timestamp":"20251002-143312", "commits":7, "reviewFile":".ai-reviews/review-pr-123-t20251002-143312.md", "durationSec":...}
Wrapper (Bitbucket/ADO)
- Parse JSON → REVIEW_BRANCH.
- Run validation loop (auto-detect lint + tests) up to 3 total LLM runs (no global time cap; 90-min per run). Between runs, write ./.ai-review/feedback.json and rerun ai-review run --exec ... if not green.
- Push REVIEW_BRANCH using SSH or HTTPS token (auto-select).
- Upload ./.ai-review/ as CI artifact.
- Exit policy: lenient — job fails only on fatal infra errors; otherwise succeed and note failures at top of the review file.

3) Artifacts & layout

Snapshot branch: ai-review/pr-<PR>-t<YYYYMMDD-HHMMSS> (no auto-deletion).
Review file (in repo): .ai-reviews/review-pr-<PR>-t<YYYYMMDD-HHMMSS>.md (immutable per run).

Bundle (CI artifacts): ./.ai-review/

metadata.json:

{
  "prNumber": 123,
  "sourceBranch": "feature/xyz",
  "targetBranch": "main",
  "baseSha": "...",
  "mergeBaseSha": "...",
  "headSha": "...",
  "createdAt": "2025-10-09T13:20:00Z",
  "runTimestamp": "20251009-132000",
  "reviewBranch": "ai-review/pr-123-t20251009-132000",
  "repo": {"remoteUrl":"...", "defaultBranch":"main"},
  "ci": {"provider":"bitbucket|azure-devops","runUrl":"..."},
  "llmGuidance": {
    "commitStyle":"small, atomic, cherry-pickable",
    "includeReviewMd": true,
    "limits": {"maxFilesHint":10,"maxLocHint":400,"maxHunkLocHint":120}
  },
  "commands": {
    "diff":"git diff --find-renames --find-copies --binary --unified=5 <merge-base>...<head>"
  }
}

diff.patch (full unified diff).
INSTRUCTIONS.md (approved text).
Optional across retries: feedback.json (wrapper writes).

4) Review report template (LLM fills body; CLI injects SHAs)

Header (CLI):

PR #<id>, Timestamp (UTC), Tool (name/version), AI commits this run: N, base/head short SHAs

Sections:

Summary (3–6 bullets) — LLM
Applied fixes — CLI lists commit subject + short SHA; LLM rationale under each (1–3 lines PR-review tone)
Suggestions (manual) — LLM bullets
Risks & follow-ups — LLM short list

5) Commit policy (LLM)

Subject: fix: <summary> [skip ci] (≤72 chars).
Body: up to 10 lines, sounds like a PR comment (what/why/risks).
Prefer PR-touched files; broaden only if clearly necessary.
Deny touching: .git/**, CI folders, node_modules/**, dist/**, vendor/**, **/*.lock, **/*.min.*, common binaries.

6) Large PR handling

Pre-filter the diff to code/text types for the purpose of size caps; then cap at ~5,000 lines or 200 files.
If still over cap → write a short notice & generate only high-level notes; avoid code edits.

7) Fork PR policy

Skip entirely (no run); print a clear log line explaining skip due to secrets/push restrictions.

8) CI wrappers (minimal)

Bitbucket Pipelines (snippet)

image: ghcr.io/<org>/ai-review:v0.1.0@sha256:<digest>
pipelines:
  pull-requests:
    '**':
      - step:
          name: AI Review
          script:
            - RUN_OUT=$(ai-review run --exec "codex --non-interactive --prompt-file ./.ai-review/INSTRUCTIONS.md" --timeout 90m --out ./.ai-review)
            - echo "$RUN_OUT"
            - REVIEW_BRANCH=$(echo "$RUN_OUT" | jq -r '.reviewBranch')
            # Retry loop (up to 3 total runs incl. first): auto-detect lint/tests; create feedback.json; re-run `ai-review run` if not green.
            # (Implementation lives in a shared bash function sourced here.)
            - git push origin "$REVIEW_BRANCH"
          # Pipeline config should exclude ai-review/** branches from triggers

Azure DevOps (snippet)

pool: { vmImage: 'ubuntu-latest' }
container: ghcr.io/<org>/ai-review:v0.1.0@sha256:<digest>
pr: { branches: { include: ['*'] } }
steps:
  - checkout: self
    persistCredentials: true
  - bash: |
      RUN_OUT=$(ai-review run --exec "codex --non-interactive --prompt-file ./.ai-review/INSTRUCTIONS.md" --timeout 90m --out ./.ai-review)
      echo "$RUN_OUT"
      REVIEW_BRANCH=$(echo "$RUN_OUT" | jq -r '.reviewBranch')
      # Retry loop as above...
      git push origin "$REVIEW_BRANCH"
    displayName: AI Review
# Add trigger exclusions for ai-review/** and rely on [skip ci] in subjects

Push creds: wrappers auto-select SSH (deploy key/agent) or HTTPS token (ADO OAuth System.AccessToken, Bitbucket App Password). Loop avoidance: [skip ci] and exclude ai-review/** from triggers.

9) Docker image (GHCR public)

Base: Ubuntu 24.04.
Includes: Git, Git-LFS, .NET 8 SDK, jq, ai-review CLI, Codex CLI, Node 20 LTS (+ pnpm, yarn, eslint, prettier, jest), Python 3.11 (+ ruff, black, pytest), Java 21 (Temurin, maven, gradle), Go 1.22 (+ golangci-lint), Rust stable (+ rustfmt, clippy).
Non-root user; pre-populated known_hosts for bitbucket.org and ssh.dev.azure.com.

10) Configuration & precedence

File: .ai-review/config.json (extended + lightweight policies)

{
  "llm": { "tool":"codex", "exec":"codex --non-interactive --prompt-file ./.ai-review/INSTRUCTIONS.md" },
  "timeouts": { "perRunMinutes": 90 },
  "naming": { "branchPrefix":"ai-review", "reviewDir":".ai-reviews", "timestampFormat":"YYYYMMDD-HHMMSS" },
  "policies": {
    "deny": ["**/*.png","**/*.jpg","**/*.exe","**/*.dll","**/*.so","**/*.pdf"],
    "caps": { "maxFiles":10, "maxLoc":400, "maxHunkLoc":120 }
  },
  "workspaces": { "mode":"auto-or-root" },
  "validation": {
    "autoDetectLintAndTests": true,
    "lintCommand": "",
    "testCommand": ""
  }
}

Precedence: CLI flags ▶ repo config ▶ env defaults.
Secrets (API keys) never in config — only via CI env.

11) Logging & outputs

stdout: single-line JSON only (for wrapper parsing).
stderr: concise status lines (start/end, tool, branch, duration).
Commit a small run summary JSON on the review branch and upload the full bundle as CI artifacts.

12) Error handling & exit codes (CLI)

0 OK
10 bundle failed (git/diff)
20 LLM timeout (90 min exceeded)
21 LLM exited non-zero
30 snapshot failed (branch/checkout)
40 invalid PR context (env + git fallback failed)

Wrappers fail the job only on 10/20/21/30/40 (lenient policy). Otherwise succeed.

13) Security posture

LLM has no remote: wrapper renames/removes origin; environment explains this to avoid confusion.
Push creds exist only in wrapper step.
Fork PRs: skipped entirely (logs explain).
Submodules: ignored (no init/update).
Full diff is provided to LLM (no scrubbing) — by design because the LLM runs in the same workspace and can read the repo. Teams concerned about data egress should run the pipeline in a private network and/or adopt a local/self-hosted LLM later.

14) Release & distribution

License: BUSL-1.1.
Registry: GHCR (public): ghcr.io/<org>/ai-review.
Versioning: SemVer tags for CLI & image (v0.1.0) + digests; wrappers pin by digest.

15) Testing plan

Unit tests (CLI)

PR context resolution (env → git fallback).
Diff generation flags (renames/binary/unified).
Timestamp naming & branch creation.
JSON stdout correctness.
Exit codes for each failure path.

Integration tests (containerized)

Happy path on a synthetic repo with a small PR; simulate LLM by a stub script that commits twice.
Large PR capping behavior: generate a diff over threshold → ensure “high-level only” note.
Retry loop: stub LLM that fails tests on first run, passes on second (wrapper validates & re-invokes).
Monorepo auto-detect: TS workspace + .NET project simultaneously.

E2E (CI dry-runs)

Bitbucket & ADO sample pipeline runs on a test repo (internal), pushing to a sandbox remote with read/write creds.

16) Iterative implementation plan (bottom-up, one task per iteration)

Rules for the process (baked into docs):

Do exactly one prioritized task per iteration; before/after, run build/lint/tests.
When adding/updating tests, include a brief “why this test matters” comment.
Before adding functionality, search the codebase to avoid re-implementing.
After each iteration, append a concise update to docs/implementation-progress.md.
Use CI-friendly, non-interactive commands.

Here’s the detailed, bottom-up Iterations Dev Plan. Each iteration is exactly one prioritized task. Every iteration includes: scope, changes, checks, tests (Given/When/Then + “why this test matters”), and deliverables.

Ground rules (apply to every iteration)

Do one change per iteration. Before/after: build, lint, tests (CI-friendly, non-interactive).
When adding/updating tests, include a brief “why this test matters” comment.
Search first (e.g., rg/git grep) to avoid re-implementing what already exists.
After each iteration, append a concise note to docs/implementation-progress.md.
Prefer non-interactive tooling and deterministic flags so CI can run hands-off.

Iteration 0 — Repository scaffold

Goal: Create the monorepo structure, license, and CI placeholders.

Changes

Create folders:

/cli/            # C# .NET 8 CLI (ai-review)
/docker/         # Dockerfile + build scripts
/wrappers/bitbucket/
/wrappers/azure-devops/
/docs/

Add LICENSE (BUSL-1.1), README.md, CODEOWNERS, .editorconfig, .gitignore.
Add docs/implementation-progress.md.

Checks: dotnet --info, git --version (in container later).
Tests: none (scaffold).
Deliverables: Initial commit with structure & docs.

Iteration 1 — CLI skeleton (`ai-review`)

Goal: Wire a minimal .NET 8 console app with command surface.

Changes
- /cli/AiReview.Cli/ .NET 8 exe.
- Commands: run with flags --pr, --out, --timeout, --exec.
- Exit codes reserved: 0, 10, 20, 21, 30, 40.
Checks: dotnet build -c Release.
Tests
- G/W/T: Given missing required flags, When running ai-review run, Then exit code ≠ 0 and shows usage. why: validates CLI argument parsing and prevents silent no-ops.
Deliverables: Built binary + basic usage doc.

Iteration 2 — PR context resolution (env → git fallback)

Goal: Normalize PR context for Bitbucket/ADO or fallback to git.

Changes
- Read well-known envs; map to canonical: PR_ID, SOURCE_BRANCH, TARGET_BRANCH, HEAD_SHA.
- Fallback: git fetch --prune; resolve HEAD_SHA, BASE_TIP, MERGE_BASE via git merge-base.
Checks: git fetch --prune --quiet.
Tests
- Env path: Given PR env vars set, When run, Then metadata in memory matches env. why: ensures portability across providers.
- Fallback: Given no PR envs, When repo has branches, Then merge-base is computed and not empty. why: resilience outside CI.
Deliverables: Internal PR context service.

Iteration 3 — Bundle writer (`metadata.json` + `diff.patch`)

Goal: Emit the review bundle to ./.ai-review/.

Changes
- Create folder; write metadata.json with context + timestamps.
- Write diff.patch using: git diff --find-renames --find-copies --binary --unified=5 <merge-base>...<head>
Checks: File existence, UTF-8, non-colored diff.
Tests
- Given a simple repo change, When run, Then .ai-review/diff.patch contains expected file/hunk headers. why: guarantees deterministic input for LLM.
Deliverables: Bundle writer component.

Iteration 4 — INSTRUCTIONS.md emitter (minimal + guardrails)

Goal: Generate the approved INSTRUCTIONS.md.

Changes
- Write the static text with injected branch name, PR #, timestamp, review path.
Checks: File exists; placeholders resolved.
Tests
- Given config and PR info, Then instructions contain resolved <REVIEW_BRANCH> and review file path. why: prevents LLM confusion / miswrites.
Deliverables: Instructions generator.

Iteration 5 — Snapshot branch creation & checkout

Goal: Create and check out ai-review/pr-<PR>-t<YYYYMMDD-HHMMSS>.

Changes
- Timestamped SEQ (UTC YYYYMMDD-HHMMSS), create branch from PR HEAD_SHA, git checkout -b.
Checks: git rev-parse --abbrev-ref HEAD equals snapshot name.
Tests
- Given HEAD at PR tip, When creating snapshot, Then branch points to same commit. why: ensures cherry-pickable history.
Deliverables: Branch naming & checkout service.

Iteration 6 — LLM orchestration (spawn + 90-min timeout)

Goal: Execute --exec "<LLM CLI ...>", enforce timeout, capture exit.

Changes
- Spawn process; stream stderr to CLI stderr; enforce 90m timeout (kill on exceed).
- Pre-run: set local git identity AI Reviewer Bot <ai-review@local>.
Checks: Process exit code; timeout path returns 20.
Tests
- Success script exits 0 → CLI exit 0. why: happy path.
- Failure script exits non-zero → CLI exit 21. why: error mapping.
- Sleep script > 90m simulated → CLI exit 20. why: watchdog works.
Deliverables: Orchestration layer.

Iteration 7 — Commit enumeration & report enrichment

Goal: Append commit subjects + short SHAs to the review file.

Changes
- Record START=git rev-parse HEAD before LLM.
- After exit: list git log --format="%H%x09%s" --no-merges --reverse START..HEAD.
- Open/create .ai-reviews/review-pr-<PR>-t<TS>.md; write Applied fixes section entries.
Checks: merge-base --is-ancestor START HEAD sanity; else warn.
Tests
- Given two commits by stub, When run ends, Then report lists both subjects with 7-char SHAs. why: traceability for cherry-pick.
Deliverables: Report enricher.

Iteration 8 — JSON stdout & structured exit codes

Goal: Emit one-line JSON on stdout, logs on stderr.

Changes
- Print {"status":"ok","reviewBranch":"...", "prNumber":..., "timestamp":"...", "commits":N, "reviewFile":"...", "durationSec":...}.
Checks: stdout has exactly one JSON line; no extra.
Tests
- Given a successful run, Then stdout parses via jq and contains branch + review file. why: wrappers can parse deterministically.
Deliverables: Output contract finalized.

Iteration 9 — Docker image (polyglot + tools)

Goal: Build GHCR image with all runtimes & tools.

Changes
- docker/Dockerfile: Ubuntu 24.04; Git + Git-LFS; .NET 8; Node 20 (+ pnpm, yarn, eslint, prettier, jest); Python 3.11 (+ ruff, black, pytest); Java 21 (+ maven, gradle); Go 1.22 (+ golangci-lint); Rust stable (+ rustfmt, clippy); jq; Codex CLI; ai-review preinstalled.
- Non-root user; known_hosts for Bitbucket & Azure DevOps.
Checks: docker build; docker run sanity (ai-review --help).
Tests
- In container: tools report versions; ai-review runs. why: reproducible CI runtime.
Deliverables: Published image to GHCR (local tag for now).

Iteration 10 — Bitbucket wrapper (minimal push)

Goal: YAML step that runs CLI and pushes the branch.

Changes
- /wrappers/bitbucket/bitbucket-pipelines.yml minimal example.
- Auto-select push method (SSH key or app password).
- Exclude ai-review/** from triggers.
Checks: Dry-run in test repo; confirm git push origin <branch>.
Tests
- Given sandbox repo, Then branch appears on remote with expected name. why: end-to-end viability on Bitbucket.
Deliverables: Working wrapper snippet + README.

Iteration 11 — Azure DevOps wrapper (minimal push)

Goal: YAML step for ADO that runs CLI and pushes with OAuth.

Changes
- /wrappers/azure-devops/azure-pipelines.yml minimal example.
- checkout: self with persistCredentials: true.
- Exclude ai-review/** triggers.
Checks: Dry-run; confirm push works.
Tests
- Given sandbox project, Then branch is pushed and visible. why: parity across providers.
Deliverables: Wrapper + README.

Iteration 12 — Validation retry loop (auto-detect lint/tests)

Goal: Wrapper logic: up to 3 total runs until green; else push anyway.

Changes
- Auto-detect commands per workspace (JS/TS, .NET, Python, Go, Rust, Java).
- On failure: write ./.ai-review/feedback.json (top errors + counts), invoke a second ai-review run --exec ..., up to 3 total.
Checks: Wrapper exit policy lenient (only infra errors fail).
Tests
- Stub LLM: first run leaves failing tests, second fixes → wrapper does 2 runs and pushes. why: validates feedback-driven loop without flake.
Deliverables: Shared bash functions + doc.

Iteration 13 — Large PR handling (pre-filter + cap)

Goal: Bound LLM input size; fall back to summary-only when too large.

Changes
- Count diff lines/files on code/text types; if > thresholds (e.g., 5k lines or 200 files), write a header notice and skip code edits.
Checks: Thresholds configurable via config later (keep defaults now).
Tests
- Synthetic large diff → report contains “too large” note, no edits attempted. why: avoids runaway cost/time.
Deliverables: Size gate in bundle stage.

Iteration 14 — Monorepo workspace detection

Goal: Detect touched roots for validation run (auto-or-root).

Changes
- Parse diff.patch to collect paths; detect nearest workspace/project markers; dedupe roots.
Checks: Log detected roots to stderr (concise).
Tests
- PR touches app-a/ (JS) and svc-b/ (.NET) → both validation commands run once. why: targeted validation, faster CI.
Deliverables: Workspace detector module.

Iteration 15 — CI loop avoidance & commit policy enforcement

Goal: Ensure commits have [skip ci] and pipeline excludes review branches.

Changes
- CLI: validate each new commit subject contains [skip ci]; if not, append note to review file and stderr warning (do not rewrite).
- Wrappers: confirm trigger exclusion patterns in YAML.
Checks: Create a dummy commit without tag → warning appears.
Tests
- Given a commit subject missing [skip ci], Then a warning block is added at top of review file. why: protects against loops without mutating commits.
Deliverables: Guardrail + docs.

Iteration 16 — Fork PR skip

Goal: Skip forked PRs (no secrets/push); print a clear reason.

Changes
- Detect fork context (provider env or repo owner mismatch); exit early with status OK and note to stderr.
Checks: Wrapper respects skip and exits 0 (lenient).
Tests
- Simulated fork env → no work done, clear log line. why: secure default behavior for external PRs.
Deliverables: Skip policy implemented.

Iteration 17 — Submodules: ignore safely

Goal: Don’t init/update submodules; surface pointer change notes.

Changes
- If .gitmodules or submodule SHA changed, add a short “not reviewed” note to the review file.
Checks: Ensure no git submodule calls.
Tests
- PR that bumps submodule pointer → note is present. why: clarity without complexity.
Deliverables: Clear behavior for submodules.

Iteration 18 — Run summary & artifact publication

Goal: Persist results for observability.

Changes
- Commit .ai-reviews/run-pr-<PR>-t<TS>.json with counts/timings/tool.
- Upload entire ./.ai-review/ as CI artifact (wrappers do upload).
Checks: File present on branch; artifact visible in CI UI.
Tests
- Parse run JSON fields (reviewBranch, durationSec) exist and are sane. why: enables debugging without logs.
Deliverables: Summary writer + wrapper artifact step.

Iteration 19 — Release plumbing (SemVer + GHCR)

Goal: Build & publish image with SemVer tags and digest output.

Changes
- GitHub Actions (or local script) to build/push ghcr.io/<org>/ai-review:v0.1.0 with v0.1, v0, latest.
- Emit immutable digest; update wrapper YAML to show pin by digest (comment retains human tag).
Checks: docker pull by digest succeeds.
Tests
- Dry-run pipeline references the digest and starts. why: reproducibility across teams.
Deliverables: Release doc + version bump checklist.

Iteration 20 — Documentation & hardening pass

Goal: Final polish for developers/operators.

Changes
- README: quickstart, CI snippets (Bitbucket/ADO), configuration, FAQ.
- /wrappers/*/README: provider-specific notes (tokens/SSH).
- Troubleshooting: timeouts, large PR, fork skip, missing [skip ci].
Checks: Links & commands validated.
Tests
- N/A (docs), but run a smoke E2E once on each provider.
Deliverables: Dev-ready docs; docs/implementation-progress.md updated.

Optional follow-ups (post-MVP)

Add adapters for Gemini CLI and Copilot CLI to --exec presets.
Configurable policies in .ai-review/config.json (caps, deny globs, timestamp format).
Platform adapters (Bitbucket/ADO) for native comments/status checks (still optional).
Local-first / self-hosted LLM mode.

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.github		.github
.vscode		.vscode
config		config
docs		docs
mutation-testing		mutation-testing
pages		pages
playwright-ui-tests		playwright-ui-tests
src		src
.babelrc.js		.babelrc.js
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json

License

cs-util-com/GitOnlyAiAgents

Folders and files

Latest commit

History

Repository files navigation