AutoResearchClaw Integration Guide

The simplest way to use AutoResearchClaw: give the repo URL to OpenClaw and say "Research [your topic]." That's it — OpenClaw handles cloning, installing, configuring, and running the entire 23-stage pipeline for you.

This guide is for humans who want to understand what's happening under the hood, or who prefer to set things up manually.

The Easy Way: OpenClaw
Manual Setup
Configuration Walkthrough
Running the Pipeline
Understanding the 23 Stages
Output Artifacts
Experiment Modes
Conference Templates
OpenClaw Bridge (Advanced)
MetaClaw Integration (Cross-Run Learning)
Other AI Platforms
Python API
Troubleshooting
FAQ

1. The Easy Way: OpenClaw

If you use OpenClaw as your AI assistant, you don't need to read the rest of this guide.

Steps

Share the GitHub repo URL with OpenClaw:

https://github.com/aiming-lab/AutoResearchClaw

OpenClaw reads RESEARCHCLAW_AGENTS.md and README.md — it now understands the entire system.

Note: RESEARCHCLAW_AGENTS.md is generated locally and listed in .gitignore. If it doesn't exist, OpenClaw can bootstrap from README.md and the project structure.

Say something like:

Research the application of graph neural networks in drug discovery

OpenClaw will:
- Clone the repo
- Create a virtual environment and install dependencies (pip install -e .)
- Copy config.researchclaw.example.yaml → config.yaml
- Ask you for an OpenAI API key (or use your environment variable)
- Run the full 23-stage pipeline
- Return the paper, experiment code, charts, and citations

That's the whole process. OpenClaw is designed to read agent definition files and bootstrap itself. AutoResearchClaw ships with these files specifically so that any OpenClaw-compatible AI assistant can pick it up and run.

What if I want to tweak settings?

Tell OpenClaw in natural language:

"Use GPT-5.2 instead of GPT-4o"
"Run experiments in sandbox mode, not simulated"
"Target ICLR 2025 format instead of NeurIPS"
"Skip the quality gate, just auto-approve everything"

OpenClaw will modify config.yaml accordingly before running the pipeline.

2. Manual Setup

Prerequisites

Requirement	Details
Python	3.11 or newer
LLM API	Any OpenAI-compatible endpoint (OpenAI, Azure, local proxy, etc.)
Disk space	~100 MB for the repo + artifacts per run
Network	Required for LLM API calls and literature search (Semantic Scholar, arXiv)

Installation

# Clone the repository
git clone https://github.com/aiming-lab/AutoResearchClaw.git
cd AutoResearchClaw

# Create a virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate    # macOS/Linux
# .venv\Scripts\activate     # Windows

# Install
pip install -e .

Verify Installation

# Check the CLI is available
researchclaw --help

# Validate your configuration
researchclaw validate --config config.yaml

3. Configuration Walkthrough

Start from the provided template:

cp config.researchclaw.example.yaml config.yaml

Open config.yaml in your editor. Here's what each section does:

LLM Settings (Required)

This is the only section you must configure. Everything else has sensible defaults.

llm:
  base_url: "https://api.openai.com/v1"     # Your LLM API endpoint
  api_key_env: "OPENAI_API_KEY"              # Environment variable name...
  api_key: ""                                # ...or paste the key directly here
  primary_model: "gpt-4o"                    # Model to use (gpt-4o, gpt-5.2, etc.)
  fallback_models:                           # Tried in order if primary fails
    - "gpt-4.1"
    - "gpt-4o-mini"
  s2_api_key: ""                             # Optional: Semantic Scholar API key for higher rate limits

Using an environment variable (recommended for security):

export OPENAI_API_KEY="sk-..."

Using a direct key (simpler, less secure):

llm:
  api_key: "sk-your-key-here"

Using a proxy or alternative provider:

llm:
  base_url: "https://your-proxy.example.com/v1"
  api_key: "your-proxy-key"
  primary_model: "gpt-4o"    # Must be supported by your endpoint

Research Settings

research:
  topic: "Your research topic here"    # Can also be set via CLI --topic flag
  domains:
    - "machine-learning"               # Guides literature search scope
  daily_paper_count: 10                # Target papers to collect
  quality_threshold: 4.0               # Minimum paper quality score (1-5)

Experiment Settings

experiment:
  mode: "sandbox"              # How experiments run (see Section 7)
  time_budget_sec: 300         # Max seconds per experiment run
  max_iterations: 10           # Max refinement loops in Stage 13
  metric_key: "primary_metric" # What metric to optimize
  metric_direction: "minimize" # "minimize" or "maximize"
  sandbox:
    python_path: ".venv/bin/python3"   # Python binary for sandbox execution
    gpu_required: false
    max_memory_mb: 4096
  code_agent:                        # CodeAgent v2 (multi-phase code generation)
    enabled: true                    # Architecture planning + sequential file gen + hard validation
  benchmark_agent:                   # Automated dataset & baseline selection
    enabled: true                    # 4-agent pipeline: Surveyor→Selector→Acquirer→Validator
  figure_agent:                      # Academic figure generation
    enabled: true                    # 5-agent pipeline: Planner→CodeGen→Renderer→Critic→Integrator
  repair:                            # Anti-fabrication experiment repair
    enabled: true                    # Diagnose and fix failed experiments before paper writing
    max_cycles: 3                    # Repair retry loops
  opencode:                          # OpenCode Beast Mode (see README for details)
    enabled: true

Export Settings

export:
  target_conference: "neurips_2025"   # See Section 8 for all available templates
  authors: "Anonymous"                 # Author line in the paper
  bib_file: "references"              # BibTeX file name (without .bib)

Everything Else (Optional)

These have reasonable defaults. Change them only if you need to:

project:
  name: "my-research"      # Just an identifier for your run
  mode: "full-auto"         # "docs-first", "semi-auto", or "full-auto"

runtime:
  timezone: "America/New_York"
  max_parallel_tasks: 3
  approval_timeout_hours: 12
  retry_limit: 2

security:
  hitl_required_stages: [5, 9, 20]     # Stages that pause for human approval
  allow_publish_without_approval: false

notifications:
  channel: "console"        # "console", "discord", or "slack"

knowledge_base:
  backend: "markdown"
  root: "docs/kb"

4. Running the Pipeline

Basic Run

# Run with topic from config.yaml
researchclaw run --config config.yaml --auto-approve

# Override topic from command line
researchclaw run --config config.yaml --topic "Transformer attention for time series" --auto-approve

CLI Commands

Command	What It Does
`researchclaw setup`	Interactive first-time setup (installs OpenCode Beast Mode, checks Docker/LaTeX)
`researchclaw init`	Interactive config creation (choose LLM provider, creates `config.arc.yaml`)
`researchclaw run`	Run the full 23-stage pipeline
`researchclaw validate`	Check your config file for errors
`researchclaw doctor`	Diagnose environment issues (Python, dependencies, API connectivity)
`researchclaw report --run-dir <path>`	Generate a human-readable summary of a completed run

Run Flags

Flag	Effect
`--topic "..."`	Override the topic in config.yaml
`--config path`	Config file path (default: `config.yaml`)
`--output path`	Output directory (default: `artifacts/<run-id>/`)
`--auto-approve`	Skip manual approval at gate stages (5, 9, 20)
`--from-stage STAGE_NAME`	Start from a specific stage (e.g., `PAPER_OUTLINE`)
`--resume`	Resume from the last checkpoint (auto-detects the most recent run matching your topic)
`--skip-preflight`	Skip LLM connectivity check before starting
`--skip-noncritical-stage`	Skip non-critical stages on failure instead of aborting
`--no-graceful-degradation`	Fail pipeline on quality gate failure instead of degrading gracefully

Examples

# Full autonomous run — no human intervention
researchclaw run -c config.yaml -t "Graph neural networks for protein folding" --auto-approve

# Resume a failed run from where it stopped
researchclaw run -c config.yaml --resume --auto-approve

# Re-run just the paper writing stages
researchclaw run -c config.yaml --from-stage PAPER_OUTLINE --auto-approve

# Check your setup before running
researchclaw doctor -c config.yaml

5. Understanding the 23 Stages

The pipeline runs in 8 phases. Each stage reads artifacts from previous stages and produces new ones.

Phase A: Research Scoping

#	Stage	What Happens	Produces
1	TOPIC_INIT	LLM formulates a SMART research goal; auto-detects GPU hardware (NVIDIA/MPS/CPU)	`goal.md`, `hardware_profile.json`
2	PROBLEM_DECOMPOSE	Breaks the goal into prioritized sub-questions	`problem_tree.md`

Phase B: Literature Discovery

#	Stage	What Happens	Produces
3	SEARCH_STRATEGY	Plans search queries and data sources	`search_plan.yaml`, `sources.json`
4	LITERATURE_COLLECT	Queries real APIs (arXiv-first, then Semantic Scholar) with expanded queries for broad coverage	`candidates.jsonl`
5	LITERATURE_SCREEN	[Gate] Filters by relevance and quality	`shortlist.jsonl`
6	KNOWLEDGE_EXTRACT	Extracts structured knowledge cards from each paper	`cards/`

Phase C: Knowledge Synthesis

#	Stage	What Happens	Produces
7	SYNTHESIS	Clusters findings, identifies research gaps	`synthesis.md`
8	HYPOTHESIS_GEN	Generates falsifiable hypotheses	`hypotheses.md`

Phase D: Experiment Design

#	Stage	What Happens	Produces
9	EXPERIMENT_DESIGN	[Gate] Designs experiment plan with baselines and metrics	`exp_plan.yaml`
10	CODE_GENERATION	LLM writes hardware-aware experiment code (adapts packages/constraints to GPU tier)	`experiment.py`, `experiment_spec.md`
11	RESOURCE_PLANNING	Estimates GPU/time requirements	`schedule.json`

Phase E: Experiment Execution

#	Stage	What Happens	Produces
12	EXPERIMENT_RUN	Runs the experiment code (sandbox or simulated); immutable harness injected for time guard and metric validation; partial results captured on timeout	`runs/`
13	ITERATIVE_REFINE	LLM analyzes results, improves code, re-runs (up to 10 iterations); timeout-aware prompts; NaN/divergence fast-fail; stdout truncated for context efficiency	`refinement_log.json`, `experiment_final.py`

Phase F: Analysis & Decision

#	Stage	What Happens	Produces
14	RESULT_ANALYSIS	Statistical analysis of experiment results	`analysis.md`
15	RESEARCH_DECISION	PROCEED / PIVOT decision with evidence	`decision.md`

Phase G: Paper Writing

#	Stage	What Happens	Produces
16	PAPER_OUTLINE	Creates section-level paper outline	`outline.md`
17	PAPER_DRAFT	Writes paper section-by-section (3 LLM calls, 5,000-6,500 words); hard-blocked when no experiment metrics (anti-fabrication); conference-grade title guidelines and abstract structure injected	`paper_draft.md`
18	PEER_REVIEW	Simulates 2+ reviewer perspectives with NeurIPS/ICML rubric (1-10 scoring); checks baselines, ablations, claims vs evidence	`reviews.md`
19	PAPER_REVISION	Addresses review comments with length guard (auto-retries if revised paper is shorter than draft)	`paper_revised.md`

Phase H: Finalization

#	Stage	What Happens	Produces
20	QUALITY_GATE	[Gate] Checks paper quality score	`quality_report.json`
21	KNOWLEDGE_ARCHIVE	Saves retrospective + reproducibility bundle	`archive.md`, `bundle_index.json`
22	EXPORT_PUBLISH	Generates LaTeX, charts, and code package	`paper_final.md`, `paper.tex`, `code/`
23	CITATION_VERIFY	Fact-checks all references against real APIs	`verification_report.json`, `references_verified.bib`

Gate Stages

Three stages pause for human review (unless --auto-approve is set):

Gate	What's Being Reviewed	On Reject, Rolls Back To
Stage 5	Are the collected papers relevant and sufficient?	Stage 4 (re-collect literature)
Stage 9	Is the experiment design sound?	Stage 8 (re-generate hypotheses)
Stage 20	Does the paper meet quality standards?	Stage 16 (re-write from outline)

For fully autonomous operation, always use --auto-approve.

6. Output Artifacts

Each run creates a timestamped directory under artifacts/:

artifacts/rc-20260310-143200-a1b2c3/
├── stage-1/goal.md                        # Research goal
├── stage-2/problem_tree.md                # Problem decomposition
├── stage-3/search_plan.yaml               # Search strategy
├── stage-4/candidates.jsonl               # Raw literature results
├── stage-5/shortlist.jsonl                # Screened papers
├── stage-6/cards/                         # Knowledge cards (one per paper)
├── stage-7/synthesis.md                   # Research gap analysis
├── stage-8/hypotheses.md                  # Research hypotheses
├── stage-9/exp_plan.yaml                  # Experiment plan
├── stage-10/experiment.py                 # Generated experiment code
├── stage-10/experiment_spec.md            # Experiment specification
├── stage-11/schedule.json                 # Resource schedule
├── stage-12/runs/run-1.json               # Experiment results
├── stage-13/experiment_final.py           # Refined experiment code
├── stage-13/experiment_v1.py              # Iteration 1 snapshot
├── stage-13/refinement_log.json           # Refinement history
├── stage-14/analysis.md                   # Statistical analysis
├── stage-14/experiment_summary.json       # Metrics summary
├── stage-15/decision.md                   # Proceed/Pivot decision
├── stage-16/outline.md                    # Paper outline
├── stage-17/paper_draft.md                # Full paper draft
├── stage-18/reviews.md                    # Simulated peer reviews
├── stage-19/paper_revised.md              # Revised paper
├── stage-20/quality_report.json           # Quality assessment
├── stage-21/archive.md                    # Knowledge retrospective
├── stage-22/
│   ├── paper_final.md                     # Final paper (Markdown)
│   ├── paper.tex                          # Conference-ready LaTeX
│   ├── references.bib                     # BibTeX references
│   ├── charts/                            # Result visualizations
│   └── code/                              # Open-source code package
│       ├── experiment.py
│       ├── requirements.txt
│       └── README.md
├── stage-23/
│   ├── verification_report.json           # Citation fact-check results
│   └── references_verified.bib            # Cleaned bibliography
└── pipeline_summary.json                  # Overall execution summary

Key Output Files

File	What You'll Use It For
`stage-22/paper.tex`	Submit to a conference (compile with `pdflatex` or `tectonic`)
`stage-22/paper_final.md`	Read or further edit the paper
`stage-22/references.bib`	Bibliography for LaTeX compilation
`stage-22/code/`	Share experiment code alongside the paper
`stage-23/verification_report.json`	Check which citations are real vs. hallucinated
`stage-13/experiment_final.py`	The best-performing experiment code
`stage-22/charts/`	Figures for the paper

7. Experiment Modes

AutoResearchClaw supports four modes for running experiments:

Simulated (Default)

experiment:
  mode: "simulated"

The LLM generates synthetic experiment results without executing any code. This is fast and requires no special setup, but the results are not real.

Best for: Quick prototyping, testing the pipeline end-to-end, environments without Python scientific packages.

Sandbox

experiment:
  mode: "sandbox"
  sandbox:
    python_path: ".venv/bin/python3"
    gpu_required: false
    max_memory_mb: 4096

The pipeline generates Python code and actually runs it in a subprocess. The code is validated before execution (AST parsing, import whitelist, no file I/O outside sandbox). Hardware-aware: Stage 1 auto-detects your GPU (NVIDIA CUDA / Apple MPS / CPU-only) and adapts the generated code accordingly — high-tier GPUs get full PyTorch code, limited GPUs get lightweight experiments, CPU-only gets NumPy/sklearn only.

Best for: Real experiments on your local machine. Supports numpy and stdlib; deep learning frameworks (torch, tensorflow) are available if installed in your environment and GPU is detected.

Safety features:

Code validation blocks dangerous operations (subprocess, eval, exec, network calls)
Configurable memory limit and execution timeout
Auto-repair: if generated code has validation errors, the LLM fixes them (up to 3 attempts)

Docker

experiment:
  mode: "docker"
  docker:
    image: "researchclaw/experiment:latest"
    gpu_enabled: true
    memory_limit_mb: 8192
    network_policy: "setup_only"   # none | setup_only | pip_only | full
    auto_install_deps: true
    shm_size_mb: 2048

The pipeline runs generated code inside a Docker container with GPU passthrough, dependency auto-installation, and network isolation. Execution follows a three-phase model within a single container:

Phase 0 (pip install): Installs auto-detected dependencies from requirements.txt (network enabled)
Phase 1 (setup.py): Runs setup.py for dataset downloads and environment preparation (network enabled)
Phase 2 (experiment): Executes the experiment code (network disabled by default via iptables)

Network policies:

none — No network at all (all phases offline). Requires all deps pre-installed in image.
setup_only (default) — Network during Phase 0+1, disabled before Phase 2 via iptables (--cap-add=NET_ADMIN).
pip_only — Network only during Phase 0 (pip install), disabled for Phase 1+2.
full — Network available throughout all phases.

Pre-cached datasets: The Docker image includes CIFAR-10/100, MNIST, FashionMNIST, STL-10, and SVHN at /opt/datasets, mounted read-only as /workspace/data. No download needed for these standard benchmarks.

Best for: Reproducible experiments with full dependency isolation. Supports GPU passthrough (NVIDIA) and configurable network policies.

Setup: Build the image first:

docker build -t researchclaw/experiment:latest researchclaw/docker/

SSH Remote

experiment:
  mode: "ssh_remote"
  ssh_remote:
    host: "gpu-server.example.com"
    gpu_ids: [0, 1]
    remote_workdir: "/tmp/researchclaw_experiments"

The pipeline sends generated code to a remote GPU server for execution.

Best for: Experiments that require GPU hardware you don't have locally.

8. Conference Templates

AutoResearchClaw generates LaTeX files formatted for specific conferences:

export:
  target_conference: "neurips_2025"

Conference	Config Value	Layout
NeurIPS 2025	`neurips_2025` (default)	Single-column, `neurips_2025` style
NeurIPS 2024	`neurips_2024`	Single-column, `neurips_2024` style
ICLR 2026	`iclr_2026`	Single-column, `iclr2026_conference` style
ICLR 2025	`iclr_2025`	Single-column, `iclr2025_conference` style
ICML 2026	`icml_2026`	Double-column, `icml2026` style
ICML 2025	`icml_2025`	Double-column, `icml2025` style

Short aliases are also accepted: neurips (→ 2025), iclr (→ 2026), icml (→ 2026).

The Markdown-to-LaTeX converter handles:

Section headings (#, ##, ###)
Inline and display math ( $...$ , $$...$$)
Bold and italic text
Ordered and unordered lists
Tables
Code blocks
Citation references ([cite_key] → \cite{cite_key})

Compiling the LaTeX

# Using tectonic (recommended)
tectonic artifacts/<run-id>/stage-22/paper.tex

# Using pdflatex
cd artifacts/<run-id>/stage-22/
pdflatex paper.tex
bibtex paper
pdflatex paper.tex
pdflatex paper.tex

9. OpenClaw Bridge (Advanced)

For deeper integration with OpenClaw, AutoResearchClaw includes a bridge adapter system. Each flag in the config activates a typed protocol interface:

openclaw_bridge:
  use_cron: true              # Scheduled research runs
  use_message: true           # Progress notifications (Discord/Slack/Telegram)
  use_memory: true            # Cross-session knowledge persistence
  use_sessions_spawn: true    # Spawn parallel sub-sessions for concurrent stages
  use_web_fetch: true         # Live web search during literature review
  use_browser: false          # Browser-based paper collection

What Each Adapter Does

Adapter	Protocol	Use Case
Cron	`CronAdapter.schedule_resume(run_id, stage_id, reason)`	Schedule pipeline resumption (e.g., daily re-runs)
Message	`MessageAdapter.notify(channel, subject, body)`	Send progress updates to chat platforms
Memory	`MemoryAdapter.append(namespace, content)`	Persist knowledge across sessions
Sessions	`SessionsAdapter.spawn(name, command)`	Run pipeline stages in parallel sub-sessions
WebFetch	`WebFetchAdapter.fetch(url)`	Fetch web pages during literature search
Browser	`BrowserAdapter.open(url)`	Open and interact with web pages

When OpenClaw provides a capability (e.g., message sending), the adapter consumes it automatically. When running standalone, recording stubs capture all calls for debugging without side effects.

This is an extension point — you don't need to configure it for basic usage.

10. MetaClaw Integration (Cross-Run Learning)

MetaClaw adds cross-run knowledge transfer to AutoResearchClaw. When enabled, the pipeline automatically captures lessons from failures and converts them into reusable skills that improve subsequent runs.

Architecture

┌──────────────────────────────────────────────────────┐
│              AutoResearchClaw Pipeline                │
│  Stage 1 → 2 → ... → 23                             │
│                                                      │
│  ┌─────────────┐    ┌──────────────────────────────┐ │
│  │ LLMClient   │───▶│ MetaClaw Integration Layer   │ │
│  │             │    │ (metaclaw_bridge module)      │ │
│  └─────────────┘    └──────────┬───────────────────┘ │
│                                │                     │
│  ┌─────────────┐    ┌──────────▼───────────────────┐ │
│  │ Evolution   │◀──▶│ Lesson ↔ Skill Bridge        │ │
│  │ Store       │    └─────────────────────────────┘ │
│  └─────────────┘                                     │
└──────────────────────────┬───────────────────────────┘
                           │
            ┌──────────────▼──────────────┐
            │     MetaClaw Proxy Server    │
            │     (optional, :30000)       │
            │  ┌────────────────────────┐  │
            │  │ SkillManager (40+ skills)│ │
            │  │ + arc-* learned skills   │ │
            │  └────────────────────────┘  │
            └─────────────────────────────┘

How It Works

Lesson Capture: During each pipeline run, the EvolutionStore automatically records failures, warnings, and anomalies as structured lessons in evolution/lessons.jsonl.
Lesson → Skill Conversion: After a run completes, lessons above a configurable severity threshold are converted into arc-* skill files stored in ~/.metaclaw/skills/. Each skill contains: trigger conditions, failure root cause, and actionable guidance.
Skill Injection: On the next run, build_overlay() reads all arc-* skills and injects them into the LLM prompt for every stage via the evolution_overlay parameter. The LLM receives explicit instructions to avoid previously encountered pitfalls.
Proxy Routing (Optional): When the MetaClaw proxy is running, LLM requests are routed through it for additional skill matching and session tracking. If the proxy is unavailable, requests automatically fall back to the direct LLM endpoint.

Setup

Step 1: Install MetaClaw

pip install metaclaw
# Or clone from source:
git clone https://github.com/aiming-lab/MetaClaw.git
cd metaclaw && pip install -e .

Step 2: Configure

Add the metaclaw_bridge section to your config.arc.yaml:

metaclaw_bridge:
  enabled: true
  proxy_url: "http://localhost:30000/v1"    # MetaClaw proxy (optional)
  skills_dir: "~/.metaclaw/skills"          # Skill storage directory
  fallback_url: "https://api.openai.com/v1" # Direct LLM fallback
  fallback_api_key_env: "OPENAI_API_KEY"
  lesson_to_skill:
    enabled: true
    min_severity: "warning"                 # Convert warnings + errors
    max_skills_per_run: 5                   # Max new skills per run

Step 3: Run

# First run — captures lessons, generates initial skills
researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve

# Check generated skills
ls ~/.metaclaw/skills/arc-*/SKILL.md

# Second run — skills from Run 1 are automatically injected
researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve

Optional: Start MetaClaw Proxy

For full skill matching and session tracking:

metaclaw start --mode skills_only --port 30000
# Or use the provided script:
bash scripts/metaclaw_start.sh

The proxy is optional — without it, the pipeline still benefits from skill injection via build_overlay() and falls back to your configured LLM endpoint.

Experiment Results

In controlled A/B experiments (same topic, same LLM, same configuration):

Metric	Baseline	With MetaClaw	Improvement
Stage retry rate	10.5%	7.9%	-24.8%
Refine cycle count	2.0	1.2	-40.0%
Pipeline stage completion	18/19	19/19	+5.3%
Overall robustness score (composite)	0.714	0.845	+18.3%

Composite robustness score is a weighted average of stage completion rate (40%), retry reduction (30%), and refine cycle efficiency (30%).

Key Files

File	Purpose
`researchclaw/metaclaw_bridge/`	Integration module (config, session, lesson_to_skill, prm_gate, skill_feedback)
`researchclaw/evolution.py`	`build_overlay()` — reads intra-run lessons + cross-run arc-* skills
`researchclaw/llm/client.py`	Proxy routing with automatic fallback
`~/.metaclaw/skills/arc-*/SKILL.md`	Learned skill files (auto-generated)
`scripts/metaclaw_start.sh`	Helper script to launch MetaClaw proxy

Backward Compatibility

Default: OFF. Without metaclaw_bridge.enabled: true, the pipeline is completely unchanged.
No new required dependencies. MetaClaw is optional.
All 1,823 existing tests pass with the integration code.

11. Other AI Platforms

AutoResearchClaw works with any AI coding assistant that can read project context files.

Claude Code

Claude Code automatically reads RESEARCHCLAW_CLAUDE.md (if present) when you open the project. It also loads the skill definition from .claude/skills/researchclaw/SKILL.md.

Note: RESEARCHCLAW_CLAUDE.md is generated locally and listed in .gitignore. The .claude/skills/researchclaw/SKILL.md file is always available in the repo.

You: Research the impact of attention mechanisms on speech recognition
Claude: [Reads project context, runs the pipeline, returns results]

Copilot CLI (GitHub)

GitHub Copilot can be used as an ACP agent via the gh CLI command (GitHub CLI with Copilot extension). Set the ACP agent to gh in your config:

llm:
  provider: "acp"
  acp:
    agent: "gh"
    cwd: "."

Prerequisites:

Install GitHub CLI (gh)
Install the Copilot extension: gh extension install github/gh-copilot
Authenticate: gh auth login

OpenCode

OpenCode loads skills from .claude/skills/. The researchclaw skill activates on research-related queries and guides the agent through the pipeline.

Any AI CLI

Provide RESEARCHCLAW_AGENTS.md (if generated locally) or README.md as context to any AI assistant. RESEARCHCLAW_AGENTS.md contains:

The agent role definition (research orchestrator)
Quick setup instructions
Pipeline stage reference
Decision guide for common scenarios

The agent reads this file and knows how to install, configure, and run the pipeline. If the file is not present, the README.md and .claude/skills/researchclaw/SKILL.md provide sufficient context for any AI assistant to operate the pipeline.

12. Python API

For programmatic use or custom integrations:

from researchclaw.pipeline.runner import execute_pipeline
from researchclaw.config import RCConfig
from researchclaw.adapters import AdapterBundle
from pathlib import Path

# Load configuration
config = RCConfig.load("config.yaml", check_paths=False)

# Run the full pipeline
results = execute_pipeline(
    run_dir=Path("artifacts/my-run"),
    run_id="run-001",
    config=config,
    adapters=AdapterBundle(),
    auto_approve_gates=True,
)

# Check results
for result in results:
    print(f"Stage {result.stage.name}: {result.status.value}")

Iterative Pipeline (Multiple Paper Revisions)

from researchclaw.pipeline.runner import execute_iterative_pipeline

results = execute_iterative_pipeline(
    run_dir=Path("artifacts/my-run"),
    run_id="run-001",
    config=config,
    adapters=AdapterBundle(),
    max_iterations=3,       # Re-run paper writing up to 3 times
    convergence_rounds=2,   # Stop if quality stabilizes for 2 rounds
)

Literature Search Only

from researchclaw.literature.search import search_papers

papers = search_papers("transformer attention mechanisms", limit=20)
for p in papers:
    print(f"{p.title} ({p.year}) — cited {p.citation_count}x")
    print(p.to_bibtex())

13. Troubleshooting

Pre-Run Diagnostics

# Check everything: Python version, dependencies, API connectivity, config validity
researchclaw doctor --config config.yaml

Common Issues

Problem	Cause	Solution
`Missing required field: llm.base_url`	Config incomplete	Set `llm.base_url` and `llm.api_key` (or `api_key_env`)
`Config validation FAILED`	Invalid YAML or missing fields	Run `researchclaw validate -c config.yaml` for details
`Preflight check... FAILED`	LLM API unreachable	Check `base_url`, API key, and network connectivity
Sandbox execution fails	Python path wrong or missing packages	Verify `experiment.sandbox.python_path` exists; ensure numpy is installed
Code validation rejects all attempts	LLM generates unsafe code	Switch to `simulated` mode, or try a more capable model
Gate stage blocks pipeline	Manual approval required	Use `--auto-approve` for autonomous mode
Pipeline fails mid-run	Transient API error	Run with `--resume` to continue from the last checkpoint
Citations marked HALLUCINATED	LLM invented fake references	This is expected — Stage 23 catches these. Use `references_verified.bib` instead
LaTeX won't compile	Missing style packages	Install the conference style files, or use `tectonic` which auto-downloads them

Resuming a Failed Run

# Resume from the exact point of failure
researchclaw run -c config.yaml --resume --auto-approve

# Or restart from a specific stage
researchclaw run -c config.yaml --from-stage EXPERIMENT_RUN --auto-approve --output artifacts/<run-id>

Reading a Run Report

researchclaw report --run-dir artifacts/rc-20260310-143200-a1b2c3

This prints a human-readable summary: which stages passed, which failed, key metrics, and paper quality scores.

14. FAQ

Q: How much does a full pipeline run cost in API credits? A: Depends on your model and topic complexity. A typical run with GPT-4o makes ~35-60 API calls across all 23 stages (paper drafting now uses 3 sequential calls for section-by-section writing). Expect roughly $3-12 per run. Simulated mode uses slightly fewer tokens since it doesn't generate real experiment code.

Q: Can I use a local LLM (Ollama, vLLM, etc.)? A: Yes — any OpenAI-compatible endpoint works. Set llm.base_url to your local server (e.g., http://localhost:11434/v1 for Ollama). Quality depends heavily on the model's capabilities.

Q: Can I run only part of the pipeline? A: Yes. Use --from-stage STAGE_NAME to start from any stage. The stage reads its inputs from previously generated artifacts, so the earlier stages must have completed at least once.

Q: Are the literature references real? A: Yes. Stage 4 uses a multi-source strategy (arXiv-first, then Semantic Scholar) with query expansion to find real papers with real titles, DOIs, and citation counts. The pipeline typically collects 100-200 candidates and aims for 30-60 references in the final paper. Stage 23 then verifies every reference to catch any that the LLM might have hallucinated during paper writing.

Q: Can I use this for a real paper submission? A: AutoResearchClaw is a research tool, not a paper mill. The output is a strong first draft that should be reviewed, improved, and validated by a human researcher before submission. Think of it as an extremely thorough research assistant.

Q: What happens if the LLM API goes down mid-run? A: The pipeline checkpoints after every stage. Use --resume to pick up where it left off. Failed stages are retried according to the max_retries setting in each stage's contract.

Q: Can I change the research topic mid-run? A: Not recommended — the pipeline builds on prior stages' outputs. Start a new run with the new topic instead.

Last updated: March 2026 · AutoResearchClaw v0.3.1+

FilesExpand file tree

integration-guide.md

Latest commit

History