Skip to content

feat: Research Agent v2 — pi-agent-core rewrite#296

Open
GindaChen wants to merge 3 commits intomainfrom
feat/v2-rewrite
Open

feat: Research Agent v2 — pi-agent-core rewrite#296
GindaChen wants to merge 3 commits intomainfrom
feat/v2-rewrite

Conversation

@GindaChen
Copy link
Collaborator

Research Agent v2: Full Rewrite

Complete rewrite of the research agent backend from Python/OpenCode to TypeScript/pi-agent-core.

What Changed

  • Deleted entire Python backend, OpenCode dependency, AgentSys, Wild Loop V2
  • Built lean Express 5 backend (~1,900 LOC) with pi-agent-core
  • Agent Engine: soul loader + seed loader (persona templates) + factory functions
  • Agent Mailbox: JSONL inbox/outbox with steer/queue primitives
  • 9 Tools: bash, readFile, writeFile, listFiles, sendToMaster, sendToWorker, reportComplete, spawnWorker, listWorkers — all using TypeBox schemas + AgentToolResult
  • Agent Seeds: researcher/ and debugger/ persona templates (soul.md + tools.json + defaults.json)
  • Extension System: loader + tmux-runner extension (5 tools)
  • Chat Frontend: minimal dark-themed HTML/CSS/JS (no React, no build step)
  • Eval Harness: 5 scenarios with assertions (file_exists, file_contains, response_contains, min_tool_calls)

Architecture

server.js (38 LOC)  →  Express 5 + extension loader
lib/agent-engine.js  →  Soul + seed loading, agent factories
lib/agent-mailbox.js →  JSONL inbox/outbox, steer/queue, registry
lib/tools/           →  9 tools (coding, communication, orchestration)
agent-seeds/         →  Persona templates (researcher, debugger)
extensions/          →  Installable modules (tmux-runner)
routes/              →  chat (SSE streaming) + agents (spawn/steer/queue)

Test Results

Unit Tests: 38/38 ✅

  • Agent mailbox: steer tagging, queue FIFO, status management, registry, polling
  • Agent engine: soul loading, seed loading, agent creation
  • Tools: bash timeout, file I/O, composers, discovery

Eval Tests (live LLM): 5/5 ✅

Scenario Duration Tool Calls Turns
file-creation 3.8s 1 2
bash-execution 4.7s 1 2
read-and-summarize 5.1s 1 2
multi-file-task 7.1s 2 2
debug-error 12.6s 3 4

Key Design Decisions

  1. Wild Loop = pi-mono loop: No custom event queue. Two primitives: steer (urgent) and queue (FIFO).
  2. Agent Seeds: Template folders for persona configuration. Users can fork, name, and evolve seeds.
  3. Extension Model: VS Code-style — each extension provides tools, panels, routes.
  4. Direct LLM streaming: No OpenCode proxy. SSE from pi-ai directly.
  5. Pure HTML frontend: No React, no build step. Single index.html with SSE.

Reflection

  • What worked: pi-agent-core API is clean. The AgentTool interface (TypeBox schemas + AgentToolResult) enforces good structure.
  • What was tricky: Initial tool format was wrong — had to learn the correct AgentTool interface (TypeBox, label, toolCallId, AgentToolResult).
  • LOC budget: Core is ~1,900 LOC (under the 2K target). Extensions are separate.
  • Eval is powerful: The scenario-based eval caught the tool integration bug immediately.

Complete rewrite foundation:
- Express 5 TS-style backend (~35 files, lean core)
- Agent engine with soul + seed loading (pi-agent-core)
- Agent mailbox (JSONL) with steer/queue primitives
- 9 tools across 3 categories (coding, communication, orchestration)
- Agent seeds: researcher + debugger persona templates
- Chat routes with direct SSE streaming (no OpenCode)
- Agent routes: spawn, steer, queue, status, output
- 37/37 unit tests passing
- CI workflow (lint + test)
- Eval runner with JSON scenario definitions and assertions
- 5 eval scenarios: file creation, bash exec, read+summarize, multi-file, debug
- Assertions: file_exists, file_contains, response_contains, min_tool_calls
- Minimal dark-themed chat frontend (pure HTML/CSS/JS, no React)
- SSE streaming display with tool event indicators
…ensions

BREAKING: All tools now use the correct AgentTool interface:
- TypeBox schemas for parameters (@sinclair/typebox)
- label property for display
- execute(toolCallId, params) signature
- AgentToolResult<T> return: {content: TextContent[], details: T}

Phase 3 additions:
- Extension loader (lib/extension-loader.js) scans extensions/ on startup
- tmux-runner extension (manifest + 5 tools: spawn, read, send, list, kill)
- Server updated to load extensions and serve /extensions route

Eval results: 5/5 scenarios pass against live Anthropic LLM
- file-creation: 3.8s (1 tool, 2 turns)
- bash-execution: 4.7s (1 tool, 2 turns)
- read-and-summarize: 5.1s (1 tool, 2 turns)
- multi-file-task: 7.1s (2 tools, 2 turns)
- debug-error: 12.6s (3 tools, 4 turns)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant