feat: Research Agent v2 — pi-agent-core rewrite by GindaChen · Pull Request #296 · hao-ai-lab/research-agent

GindaChen · 2026-02-25T08:57:52Z

Research Agent v2: Full Rewrite

Complete rewrite of the research agent backend from Python/OpenCode to TypeScript/pi-agent-core.

What Changed

Deleted entire Python backend, OpenCode dependency, AgentSys, Wild Loop V2
Built lean Express 5 backend (~1,900 LOC) with pi-agent-core
Agent Engine: soul loader + seed loader (persona templates) + factory functions
Agent Mailbox: JSONL inbox/outbox with steer/queue primitives
9 Tools: bash, readFile, writeFile, listFiles, sendToMaster, sendToWorker, reportComplete, spawnWorker, listWorkers — all using TypeBox schemas + AgentToolResult
Agent Seeds: researcher/ and debugger/ persona templates (soul.md + tools.json + defaults.json)
Extension System: loader + tmux-runner extension (5 tools)
Chat Frontend: minimal dark-themed HTML/CSS/JS (no React, no build step)
Eval Harness: 5 scenarios with assertions (file_exists, file_contains, response_contains, min_tool_calls)

Architecture

server.js (38 LOC)  →  Express 5 + extension loader
lib/agent-engine.js  →  Soul + seed loading, agent factories
lib/agent-mailbox.js →  JSONL inbox/outbox, steer/queue, registry
lib/tools/           →  9 tools (coding, communication, orchestration)
agent-seeds/         →  Persona templates (researcher, debugger)
extensions/          →  Installable modules (tmux-runner)
routes/              →  chat (SSE streaming) + agents (spawn/steer/queue)

Test Results

Unit Tests: 38/38 ✅

Agent mailbox: steer tagging, queue FIFO, status management, registry, polling
Agent engine: soul loading, seed loading, agent creation
Tools: bash timeout, file I/O, composers, discovery

Eval Tests (live LLM): 5/5 ✅

Scenario	Duration	Tool Calls	Turns
file-creation	3.8s	1	2
bash-execution	4.7s	1	2
read-and-summarize	5.1s	1	2
multi-file-task	7.1s	2	2
debug-error	12.6s	3	4

Key Design Decisions

Wild Loop = pi-mono loop: No custom event queue. Two primitives: steer (urgent) and queue (FIFO).
Agent Seeds: Template folders for persona configuration. Users can fork, name, and evolve seeds.
Extension Model: VS Code-style — each extension provides tools, panels, routes.
Direct LLM streaming: No OpenCode proxy. SSE from pi-ai directly.
Pure HTML frontend: No React, no build step. Single index.html with SSE.

Reflection

What worked: pi-agent-core API is clean. The AgentTool interface (TypeBox schemas + AgentToolResult) enforces good structure.
What was tricky: Initial tool format was wrong — had to learn the correct AgentTool interface (TypeBox, label, toolCallId, AgentToolResult).
LOC budget: Core is ~1,900 LOC (under the 2K target). Extensions are separate.
Eval is powerful: The scenario-based eval caught the tool integration bug immediately.

Complete rewrite foundation: - Express 5 TS-style backend (~35 files, lean core) - Agent engine with soul + seed loading (pi-agent-core) - Agent mailbox (JSONL) with steer/queue primitives - 9 tools across 3 categories (coding, communication, orchestration) - Agent seeds: researcher + debugger persona templates - Chat routes with direct SSE streaming (no OpenCode) - Agent routes: spawn, steer, queue, status, output - 37/37 unit tests passing - CI workflow (lint + test)

- Eval runner with JSON scenario definitions and assertions - 5 eval scenarios: file creation, bash exec, read+summarize, multi-file, debug - Assertions: file_exists, file_contains, response_contains, min_tool_calls - Minimal dark-themed chat frontend (pure HTML/CSS/JS, no React) - SSE streaming display with tool event indicators

…ensions BREAKING: All tools now use the correct AgentTool interface: - TypeBox schemas for parameters (@sinclair/typebox) - label property for display - execute(toolCallId, params) signature - AgentToolResult<T> return: {content: TextContent[], details: T} Phase 3 additions: - Extension loader (lib/extension-loader.js) scans extensions/ on startup - tmux-runner extension (manifest + 5 tools: spawn, read, send, list, kill) - Server updated to load extensions and serve /extensions route Eval results: 5/5 scenarios pass against live Anthropic LLM - file-creation: 3.8s (1 tool, 2 turns) - bash-execution: 4.7s (1 tool, 2 turns) - read-and-summarize: 5.1s (1 tool, 2 turns) - multi-file-task: 7.1s (2 tools, 2 turns) - debug-error: 12.6s (3 tools, 4 turns)

GindaChen added 3 commits February 25, 2026 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Research Agent v2 — pi-agent-core rewrite#296

feat: Research Agent v2 — pi-agent-core rewrite#296
GindaChen wants to merge 3 commits intomainfrom
feat/v2-rewrite

GindaChen commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GindaChen commented Feb 25, 2026

Research Agent v2: Full Rewrite

What Changed

Architecture

Test Results

Key Design Decisions

Reflection

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant