Skip to content

Latest commit

 

History

History
882 lines (663 loc) · 35.9 KB

File metadata and controls

882 lines (663 loc) · 35.9 KB

AutoResearchClaw Integration Guide

The simplest way to use AutoResearchClaw: give the repo URL to OpenClaw and say "Research [your topic]." That's it — OpenClaw handles cloning, installing, configuring, and running the entire 23-stage pipeline for you.

This guide is for humans who want to understand what's happening under the hood, or who prefer to set things up manually.


Table of Contents

  1. The Easy Way: OpenClaw
  2. Manual Setup
  3. Configuration Walkthrough
  4. Running the Pipeline
  5. Understanding the 23 Stages
  6. Output Artifacts
  7. Experiment Modes
  8. Conference Templates
  9. OpenClaw Bridge (Advanced)
  10. MetaClaw Integration (Cross-Run Learning)
  11. Other AI Platforms
  12. Python API
  13. Troubleshooting
  14. FAQ

1. The Easy Way: OpenClaw

If you use OpenClaw as your AI assistant, you don't need to read the rest of this guide.

Steps

  1. Share the GitHub repo URL with OpenClaw:
    https://github.com/aiming-lab/AutoResearchClaw
    
  2. OpenClaw reads RESEARCHCLAW_AGENTS.md and README.md — it now understands the entire system.

    Note: RESEARCHCLAW_AGENTS.md is generated locally and listed in .gitignore. If it doesn't exist, OpenClaw can bootstrap from README.md and the project structure.

  3. Say something like:
    Research the application of graph neural networks in drug discovery
    
  4. OpenClaw will:
    • Clone the repo
    • Create a virtual environment and install dependencies (pip install -e .)
    • Copy config.researchclaw.example.yamlconfig.yaml
    • Ask you for an OpenAI API key (or use your environment variable)
    • Run the full 23-stage pipeline
    • Return the paper, experiment code, charts, and citations

That's the whole process. OpenClaw is designed to read agent definition files and bootstrap itself. AutoResearchClaw ships with these files specifically so that any OpenClaw-compatible AI assistant can pick it up and run.

What if I want to tweak settings?

Tell OpenClaw in natural language:

  • "Use GPT-5.2 instead of GPT-4o"
  • "Run experiments in sandbox mode, not simulated"
  • "Target ICLR 2025 format instead of NeurIPS"
  • "Skip the quality gate, just auto-approve everything"

OpenClaw will modify config.yaml accordingly before running the pipeline.


2. Manual Setup

Prerequisites

Requirement Details
Python 3.11 or newer
LLM API Any OpenAI-compatible endpoint (OpenAI, Azure, local proxy, etc.)
Disk space ~100 MB for the repo + artifacts per run
Network Required for LLM API calls and literature search (Semantic Scholar, arXiv)

Installation

# Clone the repository
git clone https://github.com/aiming-lab/AutoResearchClaw.git
cd AutoResearchClaw

# Create a virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate    # macOS/Linux
# .venv\Scripts\activate     # Windows

# Install
pip install -e .

Verify Installation

# Check the CLI is available
researchclaw --help

# Validate your configuration
researchclaw validate --config config.yaml

3. Configuration Walkthrough

Start from the provided template:

cp config.researchclaw.example.yaml config.yaml

Open config.yaml in your editor. Here's what each section does:

LLM Settings (Required)

This is the only section you must configure. Everything else has sensible defaults.

llm:
  base_url: "https://api.openai.com/v1"     # Your LLM API endpoint
  api_key_env: "OPENAI_API_KEY"              # Environment variable name...
  api_key: ""                                # ...or paste the key directly here
  primary_model: "gpt-4o"                    # Model to use (gpt-4o, gpt-5.2, etc.)
  fallback_models:                           # Tried in order if primary fails
    - "gpt-4.1"
    - "gpt-4o-mini"
  s2_api_key: ""                             # Optional: Semantic Scholar API key for higher rate limits

Using an environment variable (recommended for security):

export OPENAI_API_KEY="sk-..."

Using a direct key (simpler, less secure):

llm:
  api_key: "sk-your-key-here"

Using a proxy or alternative provider:

llm:
  base_url: "https://your-proxy.example.com/v1"
  api_key: "your-proxy-key"
  primary_model: "gpt-4o"    # Must be supported by your endpoint

Research Settings

research:
  topic: "Your research topic here"    # Can also be set via CLI --topic flag
  domains:
    - "machine-learning"               # Guides literature search scope
  daily_paper_count: 10                # Target papers to collect
  quality_threshold: 4.0               # Minimum paper quality score (1-5)

Experiment Settings

experiment:
  mode: "sandbox"              # How experiments run (see Section 7)
  time_budget_sec: 300         # Max seconds per experiment run
  max_iterations: 10           # Max refinement loops in Stage 13
  metric_key: "primary_metric" # What metric to optimize
  metric_direction: "minimize" # "minimize" or "maximize"
  sandbox:
    python_path: ".venv/bin/python3"   # Python binary for sandbox execution
    gpu_required: false
    max_memory_mb: 4096
  code_agent:                        # CodeAgent v2 (multi-phase code generation)
    enabled: true                    # Architecture planning + sequential file gen + hard validation
  benchmark_agent:                   # Automated dataset & baseline selection
    enabled: true                    # 4-agent pipeline: Surveyor→Selector→Acquirer→Validator
  figure_agent:                      # Academic figure generation
    enabled: true                    # 5-agent pipeline: Planner→CodeGen→Renderer→Critic→Integrator
  repair:                            # Anti-fabrication experiment repair
    enabled: true                    # Diagnose and fix failed experiments before paper writing
    max_cycles: 3                    # Repair retry loops
  opencode:                          # OpenCode Beast Mode (see README for details)
    enabled: true

Export Settings

export:
  target_conference: "neurips_2025"   # See Section 8 for all available templates
  authors: "Anonymous"                 # Author line in the paper
  bib_file: "references"              # BibTeX file name (without .bib)

Everything Else (Optional)

These have reasonable defaults. Change them only if you need to:

project:
  name: "my-research"      # Just an identifier for your run
  mode: "full-auto"         # "docs-first", "semi-auto", or "full-auto"

runtime:
  timezone: "America/New_York"
  max_parallel_tasks: 3
  approval_timeout_hours: 12
  retry_limit: 2

security:
  hitl_required_stages: [5, 9, 20]     # Stages that pause for human approval
  allow_publish_without_approval: false

notifications:
  channel: "console"        # "console", "discord", or "slack"

knowledge_base:
  backend: "markdown"
  root: "docs/kb"

4. Running the Pipeline

Basic Run

# Run with topic from config.yaml
researchclaw run --config config.yaml --auto-approve

# Override topic from command line
researchclaw run --config config.yaml --topic "Transformer attention for time series" --auto-approve

CLI Commands

Command What It Does
researchclaw setup Interactive first-time setup (installs OpenCode Beast Mode, checks Docker/LaTeX)
researchclaw init Interactive config creation (choose LLM provider, creates config.arc.yaml)
researchclaw run Run the full 23-stage pipeline
researchclaw validate Check your config file for errors
researchclaw doctor Diagnose environment issues (Python, dependencies, API connectivity)
researchclaw report --run-dir <path> Generate a human-readable summary of a completed run

Run Flags

Flag Effect
--topic "..." Override the topic in config.yaml
--config path Config file path (default: config.yaml)
--output path Output directory (default: artifacts/<run-id>/)
--auto-approve Skip manual approval at gate stages (5, 9, 20)
--from-stage STAGE_NAME Start from a specific stage (e.g., PAPER_OUTLINE)
--resume Resume from the last checkpoint (auto-detects the most recent run matching your topic)
--skip-preflight Skip LLM connectivity check before starting
--skip-noncritical-stage Skip non-critical stages on failure instead of aborting
--no-graceful-degradation Fail pipeline on quality gate failure instead of degrading gracefully

Examples

# Full autonomous run — no human intervention
researchclaw run -c config.yaml -t "Graph neural networks for protein folding" --auto-approve

# Resume a failed run from where it stopped
researchclaw run -c config.yaml --resume --auto-approve

# Re-run just the paper writing stages
researchclaw run -c config.yaml --from-stage PAPER_OUTLINE --auto-approve

# Check your setup before running
researchclaw doctor -c config.yaml

5. Understanding the 23 Stages

The pipeline runs in 8 phases. Each stage reads artifacts from previous stages and produces new ones.

Phase A: Research Scoping

# Stage What Happens Produces
1 TOPIC_INIT LLM formulates a SMART research goal; auto-detects GPU hardware (NVIDIA/MPS/CPU) goal.md, hardware_profile.json
2 PROBLEM_DECOMPOSE Breaks the goal into prioritized sub-questions problem_tree.md

Phase B: Literature Discovery

# Stage What Happens Produces
3 SEARCH_STRATEGY Plans search queries and data sources search_plan.yaml, sources.json
4 LITERATURE_COLLECT Queries real APIs (arXiv-first, then Semantic Scholar) with expanded queries for broad coverage candidates.jsonl
5 LITERATURE_SCREEN [Gate] Filters by relevance and quality shortlist.jsonl
6 KNOWLEDGE_EXTRACT Extracts structured knowledge cards from each paper cards/

Phase C: Knowledge Synthesis

# Stage What Happens Produces
7 SYNTHESIS Clusters findings, identifies research gaps synthesis.md
8 HYPOTHESIS_GEN Generates falsifiable hypotheses hypotheses.md

Phase D: Experiment Design

# Stage What Happens Produces
9 EXPERIMENT_DESIGN [Gate] Designs experiment plan with baselines and metrics exp_plan.yaml
10 CODE_GENERATION LLM writes hardware-aware experiment code (adapts packages/constraints to GPU tier) experiment.py, experiment_spec.md
11 RESOURCE_PLANNING Estimates GPU/time requirements schedule.json

Phase E: Experiment Execution

# Stage What Happens Produces
12 EXPERIMENT_RUN Runs the experiment code (sandbox or simulated); immutable harness injected for time guard and metric validation; partial results captured on timeout runs/
13 ITERATIVE_REFINE LLM analyzes results, improves code, re-runs (up to 10 iterations); timeout-aware prompts; NaN/divergence fast-fail; stdout truncated for context efficiency refinement_log.json, experiment_final.py

Phase F: Analysis & Decision

# Stage What Happens Produces
14 RESULT_ANALYSIS Statistical analysis of experiment results analysis.md
15 RESEARCH_DECISION PROCEED / PIVOT decision with evidence decision.md

Phase G: Paper Writing

# Stage What Happens Produces
16 PAPER_OUTLINE Creates section-level paper outline outline.md
17 PAPER_DRAFT Writes paper section-by-section (3 LLM calls, 5,000-6,500 words); hard-blocked when no experiment metrics (anti-fabrication); conference-grade title guidelines and abstract structure injected paper_draft.md
18 PEER_REVIEW Simulates 2+ reviewer perspectives with NeurIPS/ICML rubric (1-10 scoring); checks baselines, ablations, claims vs evidence reviews.md
19 PAPER_REVISION Addresses review comments with length guard (auto-retries if revised paper is shorter than draft) paper_revised.md

Phase H: Finalization

# Stage What Happens Produces
20 QUALITY_GATE [Gate] Checks paper quality score quality_report.json
21 KNOWLEDGE_ARCHIVE Saves retrospective + reproducibility bundle archive.md, bundle_index.json
22 EXPORT_PUBLISH Generates LaTeX, charts, and code package paper_final.md, paper.tex, code/
23 CITATION_VERIFY Fact-checks all references against real APIs verification_report.json, references_verified.bib

Gate Stages

Three stages pause for human review (unless --auto-approve is set):

Gate What's Being Reviewed On Reject, Rolls Back To
Stage 5 Are the collected papers relevant and sufficient? Stage 4 (re-collect literature)
Stage 9 Is the experiment design sound? Stage 8 (re-generate hypotheses)
Stage 20 Does the paper meet quality standards? Stage 16 (re-write from outline)

For fully autonomous operation, always use --auto-approve.


6. Output Artifacts

Each run creates a timestamped directory under artifacts/:

artifacts/rc-20260310-143200-a1b2c3/
├── stage-1/goal.md                        # Research goal
├── stage-2/problem_tree.md                # Problem decomposition
├── stage-3/search_plan.yaml               # Search strategy
├── stage-4/candidates.jsonl               # Raw literature results
├── stage-5/shortlist.jsonl                # Screened papers
├── stage-6/cards/                         # Knowledge cards (one per paper)
├── stage-7/synthesis.md                   # Research gap analysis
├── stage-8/hypotheses.md                  # Research hypotheses
├── stage-9/exp_plan.yaml                  # Experiment plan
├── stage-10/experiment.py                 # Generated experiment code
├── stage-10/experiment_spec.md            # Experiment specification
├── stage-11/schedule.json                 # Resource schedule
├── stage-12/runs/run-1.json               # Experiment results
├── stage-13/experiment_final.py           # Refined experiment code
├── stage-13/experiment_v1.py              # Iteration 1 snapshot
├── stage-13/refinement_log.json           # Refinement history
├── stage-14/analysis.md                   # Statistical analysis
├── stage-14/experiment_summary.json       # Metrics summary
├── stage-15/decision.md                   # Proceed/Pivot decision
├── stage-16/outline.md                    # Paper outline
├── stage-17/paper_draft.md                # Full paper draft
├── stage-18/reviews.md                    # Simulated peer reviews
├── stage-19/paper_revised.md              # Revised paper
├── stage-20/quality_report.json           # Quality assessment
├── stage-21/archive.md                    # Knowledge retrospective
├── stage-22/
│   ├── paper_final.md                     # Final paper (Markdown)
│   ├── paper.tex                          # Conference-ready LaTeX
│   ├── references.bib                     # BibTeX references
│   ├── charts/                            # Result visualizations
│   └── code/                              # Open-source code package
│       ├── experiment.py
│       ├── requirements.txt
│       └── README.md
├── stage-23/
│   ├── verification_report.json           # Citation fact-check results
│   └── references_verified.bib            # Cleaned bibliography
└── pipeline_summary.json                  # Overall execution summary

Key Output Files

File What You'll Use It For
stage-22/paper.tex Submit to a conference (compile with pdflatex or tectonic)
stage-22/paper_final.md Read or further edit the paper
stage-22/references.bib Bibliography for LaTeX compilation
stage-22/code/ Share experiment code alongside the paper
stage-23/verification_report.json Check which citations are real vs. hallucinated
stage-13/experiment_final.py The best-performing experiment code
stage-22/charts/ Figures for the paper

7. Experiment Modes

AutoResearchClaw supports four modes for running experiments:

Simulated (Default)

experiment:
  mode: "simulated"

The LLM generates synthetic experiment results without executing any code. This is fast and requires no special setup, but the results are not real.

Best for: Quick prototyping, testing the pipeline end-to-end, environments without Python scientific packages.

Sandbox

experiment:
  mode: "sandbox"
  sandbox:
    python_path: ".venv/bin/python3"
    gpu_required: false
    max_memory_mb: 4096

The pipeline generates Python code and actually runs it in a subprocess. The code is validated before execution (AST parsing, import whitelist, no file I/O outside sandbox). Hardware-aware: Stage 1 auto-detects your GPU (NVIDIA CUDA / Apple MPS / CPU-only) and adapts the generated code accordingly — high-tier GPUs get full PyTorch code, limited GPUs get lightweight experiments, CPU-only gets NumPy/sklearn only.

Best for: Real experiments on your local machine. Supports numpy and stdlib; deep learning frameworks (torch, tensorflow) are available if installed in your environment and GPU is detected.

Safety features:

  • Code validation blocks dangerous operations (subprocess, eval, exec, network calls)
  • Configurable memory limit and execution timeout
  • Auto-repair: if generated code has validation errors, the LLM fixes them (up to 3 attempts)

Docker

experiment:
  mode: "docker"
  docker:
    image: "researchclaw/experiment:latest"
    gpu_enabled: true
    memory_limit_mb: 8192
    network_policy: "setup_only"   # none | setup_only | pip_only | full
    auto_install_deps: true
    shm_size_mb: 2048

The pipeline runs generated code inside a Docker container with GPU passthrough, dependency auto-installation, and network isolation. Execution follows a three-phase model within a single container:

  1. Phase 0 (pip install): Installs auto-detected dependencies from requirements.txt (network enabled)
  2. Phase 1 (setup.py): Runs setup.py for dataset downloads and environment preparation (network enabled)
  3. Phase 2 (experiment): Executes the experiment code (network disabled by default via iptables)

Network policies:

  • none — No network at all (all phases offline). Requires all deps pre-installed in image.
  • setup_only (default) — Network during Phase 0+1, disabled before Phase 2 via iptables (--cap-add=NET_ADMIN).
  • pip_only — Network only during Phase 0 (pip install), disabled for Phase 1+2.
  • full — Network available throughout all phases.

Pre-cached datasets: The Docker image includes CIFAR-10/100, MNIST, FashionMNIST, STL-10, and SVHN at /opt/datasets, mounted read-only as /workspace/data. No download needed for these standard benchmarks.

Best for: Reproducible experiments with full dependency isolation. Supports GPU passthrough (NVIDIA) and configurable network policies.

Setup: Build the image first:

docker build -t researchclaw/experiment:latest researchclaw/docker/

SSH Remote

experiment:
  mode: "ssh_remote"
  ssh_remote:
    host: "gpu-server.example.com"
    gpu_ids: [0, 1]
    remote_workdir: "/tmp/researchclaw_experiments"

The pipeline sends generated code to a remote GPU server for execution.

Best for: Experiments that require GPU hardware you don't have locally.


8. Conference Templates

AutoResearchClaw generates LaTeX files formatted for specific conferences:

export:
  target_conference: "neurips_2025"
Conference Config Value Layout
NeurIPS 2025 neurips_2025 (default) Single-column, neurips_2025 style
NeurIPS 2024 neurips_2024 Single-column, neurips_2024 style
ICLR 2026 iclr_2026 Single-column, iclr2026_conference style
ICLR 2025 iclr_2025 Single-column, iclr2025_conference style
ICML 2026 icml_2026 Double-column, icml2026 style
ICML 2025 icml_2025 Double-column, icml2025 style

Short aliases are also accepted: neurips (→ 2025), iclr (→ 2026), icml (→ 2026).

The Markdown-to-LaTeX converter handles:

  • Section headings (#, ##, ###)
  • Inline and display math ($...$, $$...$$)
  • Bold and italic text
  • Ordered and unordered lists
  • Tables
  • Code blocks
  • Citation references ([cite_key]\cite{cite_key})

Compiling the LaTeX

# Using tectonic (recommended)
tectonic artifacts/<run-id>/stage-22/paper.tex

# Using pdflatex
cd artifacts/<run-id>/stage-22/
pdflatex paper.tex
bibtex paper
pdflatex paper.tex
pdflatex paper.tex

9. OpenClaw Bridge (Advanced)

For deeper integration with OpenClaw, AutoResearchClaw includes a bridge adapter system. Each flag in the config activates a typed protocol interface:

openclaw_bridge:
  use_cron: true              # Scheduled research runs
  use_message: true           # Progress notifications (Discord/Slack/Telegram)
  use_memory: true            # Cross-session knowledge persistence
  use_sessions_spawn: true    # Spawn parallel sub-sessions for concurrent stages
  use_web_fetch: true         # Live web search during literature review
  use_browser: false          # Browser-based paper collection

What Each Adapter Does

Adapter Protocol Use Case
Cron CronAdapter.schedule_resume(run_id, stage_id, reason) Schedule pipeline resumption (e.g., daily re-runs)
Message MessageAdapter.notify(channel, subject, body) Send progress updates to chat platforms
Memory MemoryAdapter.append(namespace, content) Persist knowledge across sessions
Sessions SessionsAdapter.spawn(name, command) Run pipeline stages in parallel sub-sessions
WebFetch WebFetchAdapter.fetch(url) Fetch web pages during literature search
Browser BrowserAdapter.open(url) Open and interact with web pages

When OpenClaw provides a capability (e.g., message sending), the adapter consumes it automatically. When running standalone, recording stubs capture all calls for debugging without side effects.

This is an extension point — you don't need to configure it for basic usage.


10. MetaClaw Integration (Cross-Run Learning)

MetaClaw adds cross-run knowledge transfer to AutoResearchClaw. When enabled, the pipeline automatically captures lessons from failures and converts them into reusable skills that improve subsequent runs.

Architecture

┌──────────────────────────────────────────────────────┐
│              AutoResearchClaw Pipeline                │
│  Stage 1 → 2 → ... → 23                             │
│                                                      │
│  ┌─────────────┐    ┌──────────────────────────────┐ │
│  │ LLMClient   │───▶│ MetaClaw Integration Layer   │ │
│  │             │    │ (metaclaw_bridge module)      │ │
│  └─────────────┘    └──────────┬───────────────────┘ │
│                                │                     │
│  ┌─────────────┐    ┌──────────▼───────────────────┐ │
│  │ Evolution   │◀──▶│ Lesson ↔ Skill Bridge        │ │
│  │ Store       │    └─────────────────────────────┘ │
│  └─────────────┘                                     │
└──────────────────────────┬───────────────────────────┘
                           │
            ┌──────────────▼──────────────┐
            │     MetaClaw Proxy Server    │
            │     (optional, :30000)       │
            │  ┌────────────────────────┐  │
            │  │ SkillManager (40+ skills)│ │
            │  │ + arc-* learned skills   │ │
            │  └────────────────────────┘  │
            └─────────────────────────────┘

How It Works

  1. Lesson Capture: During each pipeline run, the EvolutionStore automatically records failures, warnings, and anomalies as structured lessons in evolution/lessons.jsonl.

  2. Lesson → Skill Conversion: After a run completes, lessons above a configurable severity threshold are converted into arc-* skill files stored in ~/.metaclaw/skills/. Each skill contains: trigger conditions, failure root cause, and actionable guidance.

  3. Skill Injection: On the next run, build_overlay() reads all arc-* skills and injects them into the LLM prompt for every stage via the evolution_overlay parameter. The LLM receives explicit instructions to avoid previously encountered pitfalls.

  4. Proxy Routing (Optional): When the MetaClaw proxy is running, LLM requests are routed through it for additional skill matching and session tracking. If the proxy is unavailable, requests automatically fall back to the direct LLM endpoint.

Setup

Step 1: Install MetaClaw

pip install metaclaw
# Or clone from source:
git clone https://github.com/aiming-lab/MetaClaw.git
cd metaclaw && pip install -e .

Step 2: Configure

Add the metaclaw_bridge section to your config.arc.yaml:

metaclaw_bridge:
  enabled: true
  proxy_url: "http://localhost:30000/v1"    # MetaClaw proxy (optional)
  skills_dir: "~/.metaclaw/skills"          # Skill storage directory
  fallback_url: "https://api.openai.com/v1" # Direct LLM fallback
  fallback_api_key_env: "OPENAI_API_KEY"
  lesson_to_skill:
    enabled: true
    min_severity: "warning"                 # Convert warnings + errors
    max_skills_per_run: 5                   # Max new skills per run

Step 3: Run

# First run — captures lessons, generates initial skills
researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve

# Check generated skills
ls ~/.metaclaw/skills/arc-*/SKILL.md

# Second run — skills from Run 1 are automatically injected
researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve

Optional: Start MetaClaw Proxy

For full skill matching and session tracking:

metaclaw start --mode skills_only --port 30000
# Or use the provided script:
bash scripts/metaclaw_start.sh

The proxy is optional — without it, the pipeline still benefits from skill injection via build_overlay() and falls back to your configured LLM endpoint.

Experiment Results

In controlled A/B experiments (same topic, same LLM, same configuration):

Metric Baseline With MetaClaw Improvement
Stage retry rate 10.5% 7.9% -24.8%
Refine cycle count 2.0 1.2 -40.0%
Pipeline stage completion 18/19 19/19 +5.3%
Overall robustness score (composite) 0.714 0.845 +18.3%

Composite robustness score is a weighted average of stage completion rate (40%), retry reduction (30%), and refine cycle efficiency (30%).

Key Files

File Purpose
researchclaw/metaclaw_bridge/ Integration module (config, session, lesson_to_skill, prm_gate, skill_feedback)
researchclaw/evolution.py build_overlay() — reads intra-run lessons + cross-run arc-* skills
researchclaw/llm/client.py Proxy routing with automatic fallback
~/.metaclaw/skills/arc-*/SKILL.md Learned skill files (auto-generated)
scripts/metaclaw_start.sh Helper script to launch MetaClaw proxy

Backward Compatibility

  • Default: OFF. Without metaclaw_bridge.enabled: true, the pipeline is completely unchanged.
  • No new required dependencies. MetaClaw is optional.
  • All 1,823 existing tests pass with the integration code.

11. Other AI Platforms

AutoResearchClaw works with any AI coding assistant that can read project context files.

Claude Code

Claude Code automatically reads RESEARCHCLAW_CLAUDE.md (if present) when you open the project. It also loads the skill definition from .claude/skills/researchclaw/SKILL.md.

Note: RESEARCHCLAW_CLAUDE.md is generated locally and listed in .gitignore. The .claude/skills/researchclaw/SKILL.md file is always available in the repo.

You: Research the impact of attention mechanisms on speech recognition
Claude: [Reads project context, runs the pipeline, returns results]

Copilot CLI (GitHub)

GitHub Copilot can be used as an ACP agent via the gh CLI command (GitHub CLI with Copilot extension). Set the ACP agent to gh in your config:

llm:
  provider: "acp"
  acp:
    agent: "gh"
    cwd: "."

Prerequisites:

  1. Install GitHub CLI (gh)
  2. Install the Copilot extension: gh extension install github/gh-copilot
  3. Authenticate: gh auth login

OpenCode

OpenCode loads skills from .claude/skills/. The researchclaw skill activates on research-related queries and guides the agent through the pipeline.

Any AI CLI

Provide RESEARCHCLAW_AGENTS.md (if generated locally) or README.md as context to any AI assistant. RESEARCHCLAW_AGENTS.md contains:

  • The agent role definition (research orchestrator)
  • Quick setup instructions
  • Pipeline stage reference
  • Decision guide for common scenarios

The agent reads this file and knows how to install, configure, and run the pipeline. If the file is not present, the README.md and .claude/skills/researchclaw/SKILL.md provide sufficient context for any AI assistant to operate the pipeline.


12. Python API

For programmatic use or custom integrations:

from researchclaw.pipeline.runner import execute_pipeline
from researchclaw.config import RCConfig
from researchclaw.adapters import AdapterBundle
from pathlib import Path

# Load configuration
config = RCConfig.load("config.yaml", check_paths=False)

# Run the full pipeline
results = execute_pipeline(
    run_dir=Path("artifacts/my-run"),
    run_id="run-001",
    config=config,
    adapters=AdapterBundle(),
    auto_approve_gates=True,
)

# Check results
for result in results:
    print(f"Stage {result.stage.name}: {result.status.value}")

Iterative Pipeline (Multiple Paper Revisions)

from researchclaw.pipeline.runner import execute_iterative_pipeline

results = execute_iterative_pipeline(
    run_dir=Path("artifacts/my-run"),
    run_id="run-001",
    config=config,
    adapters=AdapterBundle(),
    max_iterations=3,       # Re-run paper writing up to 3 times
    convergence_rounds=2,   # Stop if quality stabilizes for 2 rounds
)

Literature Search Only

from researchclaw.literature.search import search_papers

papers = search_papers("transformer attention mechanisms", limit=20)
for p in papers:
    print(f"{p.title} ({p.year}) — cited {p.citation_count}x")
    print(p.to_bibtex())

13. Troubleshooting

Pre-Run Diagnostics

# Check everything: Python version, dependencies, API connectivity, config validity
researchclaw doctor --config config.yaml

Common Issues

Problem Cause Solution
Missing required field: llm.base_url Config incomplete Set llm.base_url and llm.api_key (or api_key_env)
Config validation FAILED Invalid YAML or missing fields Run researchclaw validate -c config.yaml for details
Preflight check... FAILED LLM API unreachable Check base_url, API key, and network connectivity
Sandbox execution fails Python path wrong or missing packages Verify experiment.sandbox.python_path exists; ensure numpy is installed
Code validation rejects all attempts LLM generates unsafe code Switch to simulated mode, or try a more capable model
Gate stage blocks pipeline Manual approval required Use --auto-approve for autonomous mode
Pipeline fails mid-run Transient API error Run with --resume to continue from the last checkpoint
Citations marked HALLUCINATED LLM invented fake references This is expected — Stage 23 catches these. Use references_verified.bib instead
LaTeX won't compile Missing style packages Install the conference style files, or use tectonic which auto-downloads them

Resuming a Failed Run

# Resume from the exact point of failure
researchclaw run -c config.yaml --resume --auto-approve

# Or restart from a specific stage
researchclaw run -c config.yaml --from-stage EXPERIMENT_RUN --auto-approve --output artifacts/<run-id>

Reading a Run Report

researchclaw report --run-dir artifacts/rc-20260310-143200-a1b2c3

This prints a human-readable summary: which stages passed, which failed, key metrics, and paper quality scores.


14. FAQ

Q: How much does a full pipeline run cost in API credits? A: Depends on your model and topic complexity. A typical run with GPT-4o makes ~35-60 API calls across all 23 stages (paper drafting now uses 3 sequential calls for section-by-section writing). Expect roughly $3-12 per run. Simulated mode uses slightly fewer tokens since it doesn't generate real experiment code.

Q: Can I use a local LLM (Ollama, vLLM, etc.)? A: Yes — any OpenAI-compatible endpoint works. Set llm.base_url to your local server (e.g., http://localhost:11434/v1 for Ollama). Quality depends heavily on the model's capabilities.

Q: Can I run only part of the pipeline? A: Yes. Use --from-stage STAGE_NAME to start from any stage. The stage reads its inputs from previously generated artifacts, so the earlier stages must have completed at least once.

Q: Are the literature references real? A: Yes. Stage 4 uses a multi-source strategy (arXiv-first, then Semantic Scholar) with query expansion to find real papers with real titles, DOIs, and citation counts. The pipeline typically collects 100-200 candidates and aims for 30-60 references in the final paper. Stage 23 then verifies every reference to catch any that the LLM might have hallucinated during paper writing.

Q: Can I use this for a real paper submission? A: AutoResearchClaw is a research tool, not a paper mill. The output is a strong first draft that should be reviewed, improved, and validated by a human researcher before submission. Think of it as an extremely thorough research assistant.

Q: What happens if the LLM API goes down mid-run? A: The pipeline checkpoints after every stage. Use --resume to pick up where it left off. Failed stages are retried according to the max_retries setting in each stage's contract.

Q: Can I change the research topic mid-run? A: Not recommended — the pipeline builds on prior stages' outputs. Start a new run with the new topic instead.


Last updated: March 2026 · AutoResearchClaw v0.3.1+