Skip to content

Add Codex research workflow and review tooling#453

Open
TrueCrimeDev wants to merge 6 commits intokarpathy:masterfrom
TrueCrimeDev:review/autoresearch/consolidated
Open

Add Codex research workflow and review tooling#453
TrueCrimeDev wants to merge 6 commits intokarpathy:masterfrom
TrueCrimeDev:review/autoresearch/consolidated

Conversation

@TrueCrimeDev
Copy link
Copy Markdown

Summary

  • add a Codex-powered research runner and heavy-equipment research workflow
  • add dataset build/export automation with targeted image follow-up support
  • add local dashboard, snapshot, PR planning, and review-branch tooling

Included Files

  • README.md
  • program.md
  • specs/heavy_construction_industrial_equipment.md
  • job_templates/heavy-equipment-phase1-seed.json
  • job_templates/heavy-equipment-phase1-exhaustive.json
  • scripts/run_research.py
  • scripts/build_heavy_equipment_dataset.py
  • scripts/export_heavy_equipment_report.py
  • scripts/research_dashboard.py
  • scripts/research_pr_plan.py
  • scripts/research_snapshot.py
  • scripts/research_review_branches.py

Verification

  • python -m py_compile scripts\run_research.py scripts\build_heavy_equipment_dataset.py scripts\export_heavy_equipment_report.py scripts\research_dashboard.py scripts\research_pr_plan.py scripts\research_snapshot.py scripts\research_review_branches.py
  • python scripts\build_heavy_equipment_dataset.py --help
  • python scripts\research_dashboard.py summary --limit-runs 6 --limit-datasets 3
  • python scripts\research_dashboard.py export-html
  • python scripts\research_pr_plan.py list
  • python scripts\research_pr_plan.py plan --all
  • python scripts\research_pr_plan.py plan --all --mode stacked
  • python scripts\research_snapshot.py plan
  • python scripts\research_review_branches.py --all

UHQ-Actual and others added 6 commits March 30, 2026 18:35
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Copilot AI review requested due to automatic review settings March 31, 2026 00:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a Codex CLI–driven workflow for running structured “research specs” and managing their outputs (runs, datasets, dashboards), plus local tooling to plan/prepare reviewable PRs from commit history or a dirty worktree.

Changes:

  • Introduces a Codex research runner (doctor, one-shot run, and queued queue) plus a heavy-equipment research spec and queue templates.
  • Adds dataset build + export automation, including optional targeted follow-up passes for unresolved image URLs.
  • Adds local operational tooling: results dashboard, PR bundle planning, worktree snapshot commits, and review-branch generation.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
README.md Documents the new research runner, dataset builder, dashboard, PR planning, snapshots, and review-branch tooling.
program.md Replaces the prior training/experiment program with reusable instructions tailored to Codex research jobs.
specs/heavy_construction_industrial_equipment.md Adds a templated heavy construction equipment research specification with phase/depth variables.
job_templates/heavy-equipment-phase1-seed.json Adds an example queued job for phase 1 at seed depth.
job_templates/heavy-equipment-phase1-exhaustive.json Adds an example queued job for phase 1 at exhaustive depth.
scripts/run_research.py Implements the Codex job runner (prompt rendering, variable substitution, run artifacts, queue processing).
scripts/build_heavy_equipment_dataset.py Automates multi-phase runs, merges outputs into CSV/JSON/markdown, and supports image follow-up passes.
scripts/export_heavy_equipment_report.py Exports a combined markdown report into dataset CSV/JSON + manifest.
scripts/research_dashboard.py Generates a local HTML/terminal dashboard summarizing runs and datasets under results/.
scripts/research_pr_plan.py Plans consolidated/stacked/individual PR bundles from branch-only commits using file-overlap dependency heuristics.
scripts/research_snapshot.py Groups dirty worktree changes into commit buckets and can create snapshot commits (dry-run by default).
scripts/research_review_branches.py Materializes PR bundles as local “review/*” branches via temporary worktrees + cherry-picks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
python scripts/research_dashboard.py serve
```

By default that writes [results/dashboard/index.html](/C:/Users/uphol/Documents/Design/Coding/autoresearch/results/dashboard/index.html) and serves the repo at `http://127.0.0.1:8765/results/dashboard/index.html`.
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dashboard section includes a markdown link to an absolute local Windows path (/C:/Users/.../results/dashboard/index.html), which will be broken for other users and on GitHub. Use a repo-relative link (e.g., results/dashboard/index.html) or just refer to the path in inline code without embedding a machine-specific absolute path.

Suggested change
By default that writes [results/dashboard/index.html](/C:/Users/uphol/Documents/Design/Coding/autoresearch/results/dashboard/index.html) and serves the repo at `http://127.0.0.1:8765/results/dashboard/index.html`.
By default that writes [results/dashboard/index.html](results/dashboard/index.html) and serves the repo at `http://127.0.0.1:8765/results/dashboard/index.html`.

Copilot uses AI. Check for mistakes.
Comment thread README.md

## Dashboard

This repo now includes a lightweight run-history dashboard that reads the existing `results\` artifacts directly. It does not require pi, a web framework, or extra build steps.
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the dashboard description: "does not require pi" reads like it should be "does not require pip" (Python package installer).

Suggested change
This repo now includes a lightweight run-history dashboard that reads the existing `results\` artifacts directly. It does not require pi, a web framework, or extra build steps.
This repo now includes a lightweight run-history dashboard that reads the existing `results\` artifacts directly. It does not require pip, a web framework, or extra build steps.

Copilot uses AI. Check for mistakes.
Comment thread scripts/run_research.py
Comment on lines +90 to +94
output_dir = output_root / f"{timestamp}-{slug}"
output_dir.mkdir(parents=True, exist_ok=False)
return output_dir


Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_output_dir() uses a timestamp with second-level precision (%Y%m%d-%H%M%S) and fails if the directory already exists (exist_ok=False). In queue mode (or fast retries), multiple runs can land in the same second and collide, causing an avoidable failure. Consider adding higher-resolution time (microseconds), a random suffix, or a collision-retry loop.

Suggested change
output_dir = output_root / f"{timestamp}-{slug}"
output_dir.mkdir(parents=True, exist_ok=False)
return output_dir
# Ensure we don't fail spuriously if multiple runs land in the same second.
# Start with the base path and, on collision, append an incrementing suffix.
attempt = 0
while True:
if attempt == 0:
dir_name = f"{timestamp}-{slug}"
else:
dir_name = f"{timestamp}-{slug}-{attempt}"
output_dir = output_root / dir_name
try:
output_dir.mkdir(parents=True, exist_ok=False)
return output_dir
except FileExistsError:
attempt += 1

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +71
def dataset_dir(depth: str, tag: str | None) -> Path:
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
suffix = tag or f"heavy-construction-equipment-{depth}"
path = DATASET_ROOT / f"{timestamp}-{suffix}"
path.mkdir(parents=True, exist_ok=False)
return path
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataset_dir() uses a second-resolution timestamp and un-sanitized tag to form a folder name. This can (a) collide if multiple datasets are created within the same second and (b) fail or create surprising paths if --tag contains spaces, slashes, or other filesystem-unfriendly characters. Consider slugifying the suffix and adding collision avoidance (microseconds / random suffix / retry).

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +105
if len(cells) >= 6:
rows.append(
{
"phase_number": phase_number,
"phase": phase_label,
"category": current_category,
"row_number": cells[0],
"equipment_name": cells[1],
"also_known_as": cells[2],
"description": cells[3],
"typical_use_phase": cells[4],
"image_url": cells[5],
}
)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_markdown_report() splits table rows on every |, so any pipe character produced in aliases/descriptions will shift columns and silently corrupt the extracted fields (e.g., image_url no longer ends up in cells[5]). This is likely with LLM-generated tables. Consider enforcing/escaping pipes in the spec, or make the parser robust (e.g., split with a max column count and/or re-join overflow columns into the description field, and validate the URL column).

Suggested change
if len(cells) >= 6:
rows.append(
{
"phase_number": phase_number,
"phase": phase_label,
"category": current_category,
"row_number": cells[0],
"equipment_name": cells[1],
"also_known_as": cells[2],
"description": cells[3],
"typical_use_phase": cells[4],
"image_url": cells[5],
}
)
# Expect 6 logical columns:
# 0: row_number
# 1: equipment_name
# 2: also_known_as
# 3: description (may contain '|')
# 4: typical_use_phase
# 5: image_url
if len(cells) < 6:
i += 1
continue
if len(cells) == 6:
row_number = cells[0]
equipment_name = cells[1]
also_known_as = cells[2]
description = cells[3]
typical_use_phase = cells[4]
image_url = cells[5]
else:
# Extra '|' characters likely appeared in a text field.
# Keep the first three and last two columns fixed, and
# join any overflow into the description field.
row_number = cells[0]
equipment_name = cells[1]
also_known_as = cells[2]
if len(cells) > 3:
if len(cells) > 4:
typical_use_phase = cells[-2]
image_url = cells[-1]
description_parts = cells[3:-2]
description = "|".join(part.strip() for part in description_parts)
else:
# Fallback: treat remaining cell as description, and leave
# typical_use_phase and image_url empty.
description = cells[3]
typical_use_phase = ""
image_url = ""
else:
# Should not happen with len(cells) >= 6, but guard anyway.
description = ""
typical_use_phase = ""
image_url = ""
rows.append(
{
"phase_number": phase_number,
"phase": phase_label,
"category": current_category,
"row_number": row_number,
"equipment_name": equipment_name,
"also_known_as": also_known_as,
"description": description,
"typical_use_phase": typical_use_phase,
"image_url": image_url,
}
)

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +65
cells = [cell.strip() for cell in row_line.strip("|").split("|")]
if len(cells) >= 6:
phase_number, phase_label = phase_for_category(current_category)
rows.append(
{
"phase_number": phase_number,
"phase": phase_label,
"category": current_category,
"row_number": cells[0],
"equipment_name": cells[1],
"also_known_as": cells[2],
"description": cells[3],
"typical_use_phase": cells[4],
"image_url": cells[5],
}
)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_report() splits markdown table rows on every |, which will mis-parse rows if the report contains pipes inside any cell (common in LLM outputs), leading to wrong CSV/JSON fields (especially image_url). Consider using a more robust split (max columns) and/or validating the extracted URL column before accepting the row.

Suggested change
cells = [cell.strip() for cell in row_line.strip("|").split("|")]
if len(cells) >= 6:
phase_number, phase_label = phase_for_category(current_category)
rows.append(
{
"phase_number": phase_number,
"phase": phase_label,
"category": current_category,
"row_number": cells[0],
"equipment_name": cells[1],
"also_known_as": cells[2],
"description": cells[3],
"typical_use_phase": cells[4],
"image_url": cells[5],
}
)
raw_cells = [cell.strip() for cell in row_line.strip("|").split("|")]
if len(raw_cells) < 6:
i += 1
continue
# If there are more than 6 split segments (due to '|' inside cells),
# collapse all remaining segments into the last column.
if len(raw_cells) > 6:
cells = raw_cells[:5] + ["|".join(raw_cells[5:]).strip()]
else:
cells = raw_cells
# Validate that the image_url looks like an HTTP(S) URL before accepting the row.
image_url = cells[5]
if not re.match(r"^https?://", image_url):
i += 1
continue
phase_number, phase_label = phase_for_category(current_category)
rows.append(
{
"phase_number": phase_number,
"phase": phase_label,
"category": current_category,
"row_number": cells[0],
"equipment_name": cells[1],
"also_known_as": cells[2],
"description": cells[3],
"typical_use_phase": cells[4],
"image_url": image_url,
}
)

Copilot uses AI. Check for mistakes.
Comment on lines +143 to +156
temp_dir = Path(tempfile.mkdtemp(prefix="review-branch-"))
try:
git(["worktree", "add", "--quiet", "-b", branch_name, str(temp_dir), base_ref], cwd=repo_root)
try:
if bundle_commits:
git(["cherry-pick", *[commit.commit_hash for commit in bundle_commits]], cwd=temp_dir)
except Exception:
git(["cherry-pick", "--abort"], cwd=temp_dir, check=False)
raise
finally:
git(["worktree", "remove", "--force", str(temp_dir)], cwd=repo_root, check=False)
finally:
shutil.rmtree(temp_dir, ignore_errors=True)

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a cherry-pick fails in create_branch_from_bundle(), the worktree is removed but the newly created branch ref can remain pointing at base_ref, leaving behind partially-created review branches. Consider deleting branch_name on failure (or creating the branch only after a successful cherry-pick) so a failed run doesn’t require manual cleanup.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants