Skip to content

Git-only AI pull-request reviewer that runs in CI and writes a separate review branch with a REVIEW.md and small, cherry-pickable fix commits

License

Notifications You must be signed in to change notification settings

cs-util-com/GitOnlyAiAgents

 
 

Repository files navigation

AI Pull Request System — Developer Specification (MVP)

1) Scope & goals

  • Goal: Portable, Git-only AI reviewer that runs in CI on PR creation and on-demand, producing a separate review snapshot branch with:

    • a human-readable review file, and
    • small, cherry-pickable commits with [skip ci] to avoid loops.
  • Targets: Bitbucket Pipelines & Azure DevOps Pipelines via thin wrappers around a shared C# CLI (one-shot orchestration).

  • LLM tools (MVP order): OpenAI Codex CLI first; Gemini CLI & Copilot CLI supported next via the same --exec hook.

Non-goals (MVP)

  • No platform-native PR comments/status checks.
  • No remote pushes from the LLM (wrappers push only).
  • No scrubbing/filtering of diffs sent to the LLM (full diff by design).
  • No support for fork PRs (skipped with a clear log).
  • No submodule editing (ignored).

2) High-level flow

  1. Trigger

    • Auto on PR open; later runs on demand via manual pipeline run.
  2. C# CLI (one-shot) — ai-review run

    • Detect PR context (env → git fallback).
    • git fetch --prune; compute merge-base.
    • Create ./.ai-review/metadata.json + diff.patch.
    • Determine timestamped SEQ tYYYYMMDD-HHMMSS.
    • Create & checkout ai-review/pr-<PR>-t<TS>.
    • Write ./.ai-review/INSTRUCTIONS.md (approved minimal+guardrails).
    • Launch LLM CLI via --exec "codex …" (90-min timeout), with no remote.
    • On exit, list commits in START..HEAD; append SHAs & subjects into the review file.
    • Print single-line JSON to stdout: {"status":"ok","reviewBranch":"ai-review/pr-123-t20251002-143312", "prNumber":123, "timestamp":"20251002-143312", "commits":7, "reviewFile":".ai-reviews/review-pr-123-t20251002-143312.md", "durationSec":...}
  3. Wrapper (Bitbucket/ADO)

    • Parse JSON → REVIEW_BRANCH.
    • Run validation loop (auto-detect lint + tests) up to 3 total LLM runs (no global time cap; 90-min per run). Between runs, write ./.ai-review/feedback.json and rerun ai-review run --exec ... if not green.
    • Push REVIEW_BRANCH using SSH or HTTPS token (auto-select).
    • Upload ./.ai-review/ as CI artifact.
    • Exit policy: lenient — job fails only on fatal infra errors; otherwise succeed and note failures at top of the review file.

3) Artifacts & layout

  • Snapshot branch: ai-review/pr-<PR>-t<YYYYMMDD-HHMMSS> (no auto-deletion).

  • Review file (in repo): .ai-reviews/review-pr-<PR>-t<YYYYMMDD-HHMMSS>.md (immutable per run).

  • Bundle (CI artifacts): ./.ai-review/

    • metadata.json:

      {
        "prNumber": 123,
        "sourceBranch": "feature/xyz",
        "targetBranch": "main",
        "baseSha": "...",
        "mergeBaseSha": "...",
        "headSha": "...",
        "createdAt": "2025-10-09T13:20:00Z",
        "runTimestamp": "20251009-132000",
        "reviewBranch": "ai-review/pr-123-t20251009-132000",
        "repo": {"remoteUrl":"...", "defaultBranch":"main"},
        "ci": {"provider":"bitbucket|azure-devops","runUrl":"..."},
        "llmGuidance": {
          "commitStyle":"small, atomic, cherry-pickable",
          "includeReviewMd": true,
          "limits": {"maxFilesHint":10,"maxLocHint":400,"maxHunkLocHint":120}
        },
        "commands": {
          "diff":"git diff --find-renames --find-copies --binary --unified=5 <merge-base>...<head>"
        }
      }
    • diff.patch (full unified diff).

    • INSTRUCTIONS.md (approved text).

    • Optional across retries: feedback.json (wrapper writes).


4) Review report template (LLM fills body; CLI injects SHAs)

Header (CLI):

  • PR #<id>, Timestamp (UTC), Tool (name/version), AI commits this run: N, base/head short SHAs

Sections:

  • Summary (3–6 bullets) — LLM
  • Applied fixesCLI lists commit subject + short SHA; LLM rationale under each (1–3 lines PR-review tone)
  • Suggestions (manual) — LLM bullets
  • Risks & follow-ups — LLM short list

5) Commit policy (LLM)

  • Subject: fix: <summary> [skip ci] (≤72 chars).
  • Body: up to 10 lines, sounds like a PR comment (what/why/risks).
  • Prefer PR-touched files; broaden only if clearly necessary.
  • Deny touching: .git/**, CI folders, node_modules/**, dist/**, vendor/**, **/*.lock, **/*.min.*, common binaries.

6) Large PR handling

  • Pre-filter the diff to code/text types for the purpose of size caps; then cap at ~5,000 lines or 200 files.
  • If still over cap → write a short notice & generate only high-level notes; avoid code edits.

7) Fork PR policy

  • Skip entirely (no run); print a clear log line explaining skip due to secrets/push restrictions.

8) CI wrappers (minimal)

Bitbucket Pipelines (snippet)

image: ghcr.io/<org>/ai-review:v0.1.0@sha256:<digest>
pipelines:
  pull-requests:
    '**':
      - step:
          name: AI Review
          script:
            - RUN_OUT=$(ai-review run --exec "codex --non-interactive --prompt-file ./.ai-review/INSTRUCTIONS.md" --timeout 90m --out ./.ai-review)
            - echo "$RUN_OUT"
            - REVIEW_BRANCH=$(echo "$RUN_OUT" | jq -r '.reviewBranch')
            # Retry loop (up to 3 total runs incl. first): auto-detect lint/tests; create feedback.json; re-run `ai-review run` if not green.
            # (Implementation lives in a shared bash function sourced here.)
            - git push origin "$REVIEW_BRANCH"
          # Pipeline config should exclude ai-review/** branches from triggers

Azure DevOps (snippet)

pool: { vmImage: 'ubuntu-latest' }
container: ghcr.io/<org>/ai-review:v0.1.0@sha256:<digest>
pr: { branches: { include: ['*'] } }
steps:
  - checkout: self
    persistCredentials: true
  - bash: |
      RUN_OUT=$(ai-review run --exec "codex --non-interactive --prompt-file ./.ai-review/INSTRUCTIONS.md" --timeout 90m --out ./.ai-review)
      echo "$RUN_OUT"
      REVIEW_BRANCH=$(echo "$RUN_OUT" | jq -r '.reviewBranch')
      # Retry loop as above...
      git push origin "$REVIEW_BRANCH"
    displayName: AI Review
# Add trigger exclusions for ai-review/** and rely on [skip ci] in subjects

Push creds: wrappers auto-select SSH (deploy key/agent) or HTTPS token (ADO OAuth System.AccessToken, Bitbucket App Password). Loop avoidance: [skip ci] and exclude ai-review/** from triggers.


9) Docker image (GHCR public)

  • Base: Ubuntu 24.04.
  • Includes: Git, Git-LFS, .NET 8 SDK, jq, ai-review CLI, Codex CLI, Node 20 LTS (+ pnpm, yarn, eslint, prettier, jest), Python 3.11 (+ ruff, black, pytest), Java 21 (Temurin, maven, gradle), Go 1.22 (+ golangci-lint), Rust stable (+ rustfmt, clippy).
  • Non-root user; pre-populated known_hosts for bitbucket.org and ssh.dev.azure.com.

10) Configuration & precedence

  • File: .ai-review/config.json (extended + lightweight policies)

    {
      "llm": { "tool":"codex", "exec":"codex --non-interactive --prompt-file ./.ai-review/INSTRUCTIONS.md" },
      "timeouts": { "perRunMinutes": 90 },
      "naming": { "branchPrefix":"ai-review", "reviewDir":".ai-reviews", "timestampFormat":"YYYYMMDD-HHMMSS" },
      "policies": {
        "deny": ["**/*.png","**/*.jpg","**/*.exe","**/*.dll","**/*.so","**/*.pdf"],
        "caps": { "maxFiles":10, "maxLoc":400, "maxHunkLoc":120 }
      },
      "workspaces": { "mode":"auto-or-root" },
      "validation": {
        "autoDetectLintAndTests": true,
        "lintCommand": "",
        "testCommand": ""
      }
    }
  • Precedence: CLI flags ▶ repo config ▶ env defaults.

  • Secrets (API keys) never in config — only via CI env.


11) Logging & outputs

  • stdout: single-line JSON only (for wrapper parsing).
  • stderr: concise status lines (start/end, tool, branch, duration).
  • Commit a small run summary JSON on the review branch and upload the full bundle as CI artifacts.

12) Error handling & exit codes (CLI)

  • 0 OK
  • 10 bundle failed (git/diff)
  • 20 LLM timeout (90 min exceeded)
  • 21 LLM exited non-zero
  • 30 snapshot failed (branch/checkout)
  • 40 invalid PR context (env + git fallback failed)

Wrappers fail the job only on 10/20/21/30/40 (lenient policy). Otherwise succeed.


13) Security posture

  • LLM has no remote: wrapper renames/removes origin; environment explains this to avoid confusion.
  • Push creds exist only in wrapper step.
  • Fork PRs: skipped entirely (logs explain).
  • Submodules: ignored (no init/update).
  • Full diff is provided to LLM (no scrubbing) — by design because the LLM runs in the same workspace and can read the repo. Teams concerned about data egress should run the pipeline in a private network and/or adopt a local/self-hosted LLM later.

14) Release & distribution

  • License: BUSL-1.1.
  • Registry: GHCR (public): ghcr.io/<org>/ai-review.
  • Versioning: SemVer tags for CLI & image (v0.1.0) + digests; wrappers pin by digest.

15) Testing plan

Unit tests (CLI)

  • PR context resolution (env → git fallback).
  • Diff generation flags (renames/binary/unified).
  • Timestamp naming & branch creation.
  • JSON stdout correctness.
  • Exit codes for each failure path.

Integration tests (containerized)

  • Happy path on a synthetic repo with a small PR; simulate LLM by a stub script that commits twice.
  • Large PR capping behavior: generate a diff over threshold → ensure “high-level only” note.
  • Retry loop: stub LLM that fails tests on first run, passes on second (wrapper validates & re-invokes).
  • Monorepo auto-detect: TS workspace + .NET project simultaneously.

E2E (CI dry-runs)

  • Bitbucket & ADO sample pipeline runs on a test repo (internal), pushing to a sandbox remote with read/write creds.

16) Iterative implementation plan (bottom-up, one task per iteration)

Rules for the process (baked into docs):

  1. Do exactly one prioritized task per iteration; before/after, run build/lint/tests.
  2. When adding/updating tests, include a brief “why this test matters” comment.
  3. Before adding functionality, search the codebase to avoid re-implementing.
  4. After each iteration, append a concise update to docs/implementation-progress.md.
  5. Use CI-friendly, non-interactive commands.

Here’s the detailed, bottom-up Iterations Dev Plan. Each iteration is exactly one prioritized task. Every iteration includes: scope, changes, checks, tests (Given/When/Then + “why this test matters”), and deliverables.


Ground rules (apply to every iteration)

  • Do one change per iteration. Before/after: build, lint, tests (CI-friendly, non-interactive).
  • When adding/updating tests, include a brief “why this test matters” comment.
  • Search first (e.g., rg/git grep) to avoid re-implementing what already exists.
  • After each iteration, append a concise note to docs/implementation-progress.md.
  • Prefer non-interactive tooling and deterministic flags so CI can run hands-off.

Iteration 0 — Repository scaffold

Goal: Create the monorepo structure, license, and CI placeholders.

  • Changes

    • Create folders:

      /cli/            # C# .NET 8 CLI (ai-review)
      /docker/         # Dockerfile + build scripts
      /wrappers/bitbucket/
      /wrappers/azure-devops/
      /docs/
      
    • Add LICENSE (BUSL-1.1), README.md, CODEOWNERS, .editorconfig, .gitignore.

    • Add docs/implementation-progress.md.

  • Checks: dotnet --info, git --version (in container later).

  • Tests: none (scaffold).

  • Deliverables: Initial commit with structure & docs.


Iteration 1 — CLI skeleton (ai-review)

Goal: Wire a minimal .NET 8 console app with command surface.

  • Changes

    • /cli/AiReview.Cli/ .NET 8 exe.
    • Commands: run with flags --pr, --out, --timeout, --exec.
    • Exit codes reserved: 0, 10, 20, 21, 30, 40.
  • Checks: dotnet build -c Release.

  • Tests

    • G/W/T: Given missing required flags, When running ai-review run, Then exit code ≠ 0 and shows usage. why: validates CLI argument parsing and prevents silent no-ops.
  • Deliverables: Built binary + basic usage doc.


Iteration 2 — PR context resolution (env → git fallback)

Goal: Normalize PR context for Bitbucket/ADO or fallback to git.

  • Changes

    • Read well-known envs; map to canonical: PR_ID, SOURCE_BRANCH, TARGET_BRANCH, HEAD_SHA.
    • Fallback: git fetch --prune; resolve HEAD_SHA, BASE_TIP, MERGE_BASE via git merge-base.
  • Checks: git fetch --prune --quiet.

  • Tests

    • Env path: Given PR env vars set, When run, Then metadata in memory matches env. why: ensures portability across providers.
    • Fallback: Given no PR envs, When repo has branches, Then merge-base is computed and not empty. why: resilience outside CI.
  • Deliverables: Internal PR context service.


Iteration 3 — Bundle writer (metadata.json + diff.patch)

Goal: Emit the review bundle to ./.ai-review/.

  • Changes

    • Create folder; write metadata.json with context + timestamps.
    • Write diff.patch using: git diff --find-renames --find-copies --binary --unified=5 <merge-base>...<head>
  • Checks: File existence, UTF-8, non-colored diff.

  • Tests

    • Given a simple repo change, When run, Then .ai-review/diff.patch contains expected file/hunk headers. why: guarantees deterministic input for LLM.
  • Deliverables: Bundle writer component.


Iteration 4 — INSTRUCTIONS.md emitter (minimal + guardrails)

Goal: Generate the approved INSTRUCTIONS.md.

  • Changes

    • Write the static text with injected branch name, PR #, timestamp, review path.
  • Checks: File exists; placeholders resolved.

  • Tests

    • Given config and PR info, Then instructions contain resolved <REVIEW_BRANCH> and review file path. why: prevents LLM confusion / miswrites.
  • Deliverables: Instructions generator.


Iteration 5 — Snapshot branch creation & checkout

Goal: Create and check out ai-review/pr-<PR>-t<YYYYMMDD-HHMMSS>.

  • Changes

    • Timestamped SEQ (UTC YYYYMMDD-HHMMSS), create branch from PR HEAD_SHA, git checkout -b.
  • Checks: git rev-parse --abbrev-ref HEAD equals snapshot name.

  • Tests

    • Given HEAD at PR tip, When creating snapshot, Then branch points to same commit. why: ensures cherry-pickable history.
  • Deliverables: Branch naming & checkout service.


Iteration 6 — LLM orchestration (spawn + 90-min timeout)

Goal: Execute --exec "<LLM CLI ...>", enforce timeout, capture exit.

  • Changes

    • Spawn process; stream stderr to CLI stderr; enforce 90m timeout (kill on exceed).
    • Pre-run: set local git identity AI Reviewer Bot <ai-review@local>.
  • Checks: Process exit code; timeout path returns 20.

  • Tests

    • Success script exits 0 → CLI exit 0. why: happy path.
    • Failure script exits non-zero → CLI exit 21. why: error mapping.
    • Sleep script > 90m simulated → CLI exit 20. why: watchdog works.
  • Deliverables: Orchestration layer.


Iteration 7 — Commit enumeration & report enrichment

Goal: Append commit subjects + short SHAs to the review file.

  • Changes

    • Record START=git rev-parse HEAD before LLM.
    • After exit: list git log --format="%H%x09%s" --no-merges --reverse START..HEAD.
    • Open/create .ai-reviews/review-pr-<PR>-t<TS>.md; write Applied fixes section entries.
  • Checks: merge-base --is-ancestor START HEAD sanity; else warn.

  • Tests

    • Given two commits by stub, When run ends, Then report lists both subjects with 7-char SHAs. why: traceability for cherry-pick.
  • Deliverables: Report enricher.


Iteration 8 — JSON stdout & structured exit codes

Goal: Emit one-line JSON on stdout, logs on stderr.

  • Changes

    • Print {"status":"ok","reviewBranch":"...", "prNumber":..., "timestamp":"...", "commits":N, "reviewFile":"...", "durationSec":...}.
  • Checks: stdout has exactly one JSON line; no extra.

  • Tests

    • Given a successful run, Then stdout parses via jq and contains branch + review file. why: wrappers can parse deterministically.
  • Deliverables: Output contract finalized.


Iteration 9 — Docker image (polyglot + tools)

Goal: Build GHCR image with all runtimes & tools.

  • Changes

    • docker/Dockerfile: Ubuntu 24.04; Git + Git-LFS; .NET 8; Node 20 (+ pnpm, yarn, eslint, prettier, jest); Python 3.11 (+ ruff, black, pytest); Java 21 (+ maven, gradle); Go 1.22 (+ golangci-lint); Rust stable (+ rustfmt, clippy); jq; Codex CLI; ai-review preinstalled.
    • Non-root user; known_hosts for Bitbucket & Azure DevOps.
  • Checks: docker build; docker run sanity (ai-review --help).

  • Tests

    • In container: tools report versions; ai-review runs. why: reproducible CI runtime.
  • Deliverables: Published image to GHCR (local tag for now).


Iteration 10 — Bitbucket wrapper (minimal push)

Goal: YAML step that runs CLI and pushes the branch.

  • Changes

    • /wrappers/bitbucket/bitbucket-pipelines.yml minimal example.
    • Auto-select push method (SSH key or app password).
    • Exclude ai-review/** from triggers.
  • Checks: Dry-run in test repo; confirm git push origin <branch>.

  • Tests

    • Given sandbox repo, Then branch appears on remote with expected name. why: end-to-end viability on Bitbucket.
  • Deliverables: Working wrapper snippet + README.


Iteration 11 — Azure DevOps wrapper (minimal push)

Goal: YAML step for ADO that runs CLI and pushes with OAuth.

  • Changes

    • /wrappers/azure-devops/azure-pipelines.yml minimal example.
    • checkout: self with persistCredentials: true.
    • Exclude ai-review/** triggers.
  • Checks: Dry-run; confirm push works.

  • Tests

    • Given sandbox project, Then branch is pushed and visible. why: parity across providers.
  • Deliverables: Wrapper + README.


Iteration 12 — Validation retry loop (auto-detect lint/tests)

Goal: Wrapper logic: up to 3 total runs until green; else push anyway.

  • Changes

    • Auto-detect commands per workspace (JS/TS, .NET, Python, Go, Rust, Java).
    • On failure: write ./.ai-review/feedback.json (top errors + counts), invoke a second ai-review run --exec ..., up to 3 total.
  • Checks: Wrapper exit policy lenient (only infra errors fail).

  • Tests

    • Stub LLM: first run leaves failing tests, second fixes → wrapper does 2 runs and pushes. why: validates feedback-driven loop without flake.
  • Deliverables: Shared bash functions + doc.


Iteration 13 — Large PR handling (pre-filter + cap)

Goal: Bound LLM input size; fall back to summary-only when too large.

  • Changes

    • Count diff lines/files on code/text types; if > thresholds (e.g., 5k lines or 200 files), write a header notice and skip code edits.
  • Checks: Thresholds configurable via config later (keep defaults now).

  • Tests

    • Synthetic large diff → report contains “too large” note, no edits attempted. why: avoids runaway cost/time.
  • Deliverables: Size gate in bundle stage.


Iteration 14 — Monorepo workspace detection

Goal: Detect touched roots for validation run (auto-or-root).

  • Changes

    • Parse diff.patch to collect paths; detect nearest workspace/project markers; dedupe roots.
  • Checks: Log detected roots to stderr (concise).

  • Tests

    • PR touches app-a/ (JS) and svc-b/ (.NET) → both validation commands run once. why: targeted validation, faster CI.
  • Deliverables: Workspace detector module.


Iteration 15 — CI loop avoidance & commit policy enforcement

Goal: Ensure commits have [skip ci] and pipeline excludes review branches.

  • Changes

    • CLI: validate each new commit subject contains [skip ci]; if not, append note to review file and stderr warning (do not rewrite).
    • Wrappers: confirm trigger exclusion patterns in YAML.
  • Checks: Create a dummy commit without tag → warning appears.

  • Tests

    • Given a commit subject missing [skip ci], Then a warning block is added at top of review file. why: protects against loops without mutating commits.
  • Deliverables: Guardrail + docs.


Iteration 16 — Fork PR skip

Goal: Skip forked PRs (no secrets/push); print a clear reason.

  • Changes

    • Detect fork context (provider env or repo owner mismatch); exit early with status OK and note to stderr.
  • Checks: Wrapper respects skip and exits 0 (lenient).

  • Tests

    • Simulated fork env → no work done, clear log line. why: secure default behavior for external PRs.
  • Deliverables: Skip policy implemented.


Iteration 17 — Submodules: ignore safely

Goal: Don’t init/update submodules; surface pointer change notes.

  • Changes

    • If .gitmodules or submodule SHA changed, add a short “not reviewed” note to the review file.
  • Checks: Ensure no git submodule calls.

  • Tests

    • PR that bumps submodule pointer → note is present. why: clarity without complexity.
  • Deliverables: Clear behavior for submodules.


Iteration 18 — Run summary & artifact publication

Goal: Persist results for observability.

  • Changes

    • Commit .ai-reviews/run-pr-<PR>-t<TS>.json with counts/timings/tool.
    • Upload entire ./.ai-review/ as CI artifact (wrappers do upload).
  • Checks: File present on branch; artifact visible in CI UI.

  • Tests

    • Parse run JSON fields (reviewBranch, durationSec) exist and are sane. why: enables debugging without logs.
  • Deliverables: Summary writer + wrapper artifact step.


Iteration 19 — Release plumbing (SemVer + GHCR)

Goal: Build & publish image with SemVer tags and digest output.

  • Changes

    • GitHub Actions (or local script) to build/push ghcr.io/<org>/ai-review:v0.1.0 with v0.1, v0, latest.
    • Emit immutable digest; update wrapper YAML to show pin by digest (comment retains human tag).
  • Checks: docker pull by digest succeeds.

  • Tests

    • Dry-run pipeline references the digest and starts. why: reproducibility across teams.
  • Deliverables: Release doc + version bump checklist.


Iteration 20 — Documentation & hardening pass

Goal: Final polish for developers/operators.

  • Changes

    • README: quickstart, CI snippets (Bitbucket/ADO), configuration, FAQ.
    • /wrappers/*/README: provider-specific notes (tokens/SSH).
    • Troubleshooting: timeouts, large PR, fork skip, missing [skip ci].
  • Checks: Links & commands validated.

  • Tests

    • N/A (docs), but run a smoke E2E once on each provider.
  • Deliverables: Dev-ready docs; docs/implementation-progress.md updated.


Optional follow-ups (post-MVP)

  • Add adapters for Gemini CLI and Copilot CLI to --exec presets.
  • Configurable policies in .ai-review/config.json (caps, deny globs, timestamp format).
  • Platform adapters (Bitbucket/ADO) for native comments/status checks (still optional).
  • Local-first / self-hosted LLM mode.

About

Git-only AI pull-request reviewer that runs in CI and writes a separate review branch with a REVIEW.md and small, cherry-pickable fix commits

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 76.1%
  • HTML 23.9%