Building hierarchical high-performance multi-agent systems made easy! (Beta)
- π§ Conceptual Overview
- π¦ Installation & Setup
- β‘ Quickstart: End-to-End Workflow
- βοΈ Configuration & Storage
- π§° Toolkits
- π REST API & CLI
- ποΈ Core Building Block:
BaseModule - π Module Reference
- π― Advanced Patterns
- π§ͺ Testing
- π‘ Troubleshooting & Tips
- π Glossary
ROMA is a meta-agent framework that uses recursive hierarchical structures to solve complex problems. By breaking down tasks into parallelizable components, ROMA enables agents to tackle sophisticated reasoning challenges while maintaining transparency that makes context-engineering and iteration straightforward. The framework offers parallel problem solving where agents work simultaneously on different parts of complex tasks, transparent development with a clear structure for easy debugging, and proven performance demonstrated through our search agent's strong benchmark results. We've shown the framework's effectiveness, but this is just the beginning. As an open-source and extensible platform, ROMA is designed for community-driven development, allowing you to build and customize agents for your specific needs while benefiting from the collective improvements of the community.
ROMA framework processes tasks through a recursive planβexecute loop:
def solve(task):
if is_atomic(task): # Step 1: Atomizer
return execute(task) # Step 2: Executor
else:
subtasks = plan(task) # Step 2: Planner
results = []
for subtask in subtasks:
results.append(solve(subtask)) # Recursive call
return aggregate(results) # Step 3: Aggregator
# Entry point:
answer = solve(initial_request)- Atomizer β Decides whether a request is atomic (directly executable) or requires planning.
- Planner β If planning is needed, the task is broken into smaller subtasks. Each subtask is fed back into the Atomizer, making the process recursive.
- Executors β Handle atomic tasks. Executors can be LLMs, APIs, or even other agents β as long as they implement an
agent.execute()interface. - Aggregator β Collects and integrates results from subtasks. Importantly, the Aggregator produces the answer to the original parent task, not just raw child outputs.
- Top-down: Tasks are decomposed into subtasks recursively.
- Bottom-up: Subtask results are aggregated upwards into solutions for parent tasks.
- Left-to-right: If a subtask depends on the output of a previous one, it waits until that subtask completes before execution.
This structure makes the system flexible, recursive, and dependency-aware β capable of decomposing complex problems into smaller steps while ensuring results are integrated coherently.
Click to view the system flow diagram
flowchart TB
A[Your Request] --> B{Atomizer}
B -->|Plan Needed| C[Planner]
B -->|Atomic Task| D[Executor]
%% Planner spawns subtasks
C --> E[Subtasks]
E --> G[Aggregator]
%% Recursion
E -.-> B
%% Execution + Aggregation
D --> F[Final Result]
G --> F
style A fill:#e1f5fe
style F fill:#c8e6c9
style B fill:#fff3e0
style C fill:#ffe0b2
style D fill:#d1c4e9
style G fill:#c5cae9
Recommended: Complete Setup with Docker (Production-ready, includes all features)
# One-command setup (builds Docker, starts services, optional E2B/S3)
just setup
# Or with specific profile
just setup crypto_agent
# Verify services are running
curl http://localhost:8000/health
# Solve your first task
just solve "What is the capital of France?"
# Visualize execution with interactive TUI (best with MLflow enabled)
just docker-up-full # Start with MLflow for full visualization
just viz <execution_id>What just setup includes:
- β Builds Docker images
- β Starts all services (PostgreSQL, MinIO, REST API)
- β Configures environment (.env)
- β Optional: S3 storage mount (prompts)
- β Optional: E2B code execution template (prompts)
- β
Creates CLI shortcuts (
./cli,./run)
Alternative: Manual Docker Setup (Skip prompts)
# Start services without setup wizard
just docker-up # Basic (PostgreSQL + MinIO + API)
just docker-up-full # With MLflow observabilityServices Available:
- π REST API: http://localhost:8000/docs
- ποΈ PostgreSQL: Automatic persistence
- π¦ MinIO: S3-compatible storage (http://localhost:9001)
- π MLflow: http://localhost:5000 (with
docker-up-full)
See Quick Start Guide and Deployment Guide for details.
ROMA's module layer wraps canonical DSPy patterns into purpose-built components that reflect the lifecycle of complex task execution:
- Atomizer decides whether a request can be handled directly or needs decomposition.
- Planner breaks non-atomic goals into an ordered graph of subtasks.
- Executor resolves individual subtasks, optionally routing through function/tool calls.
- Aggregator synthesizes subtask outputs back into a coherent answer.
- Verifier (optional) inspects the aggregate output against the original goal before delivering.
Every module shares the same ergonomics: instantiate it with a language model (LM) or provider string, choose a prediction strategy, then call .forward() (or .aforward() for async) with the task-specific fields.
All modules ultimately delegate to DSPy signatures defined in roma_dspy.core.signatures. This keeps interfaces stable even as the internals evolve.
Prerequisites:
- Docker & Docker Compose (required)
- Python 3.12+ (for local development)
- Just command runner (optional, recommended)
Complete Setup (Builds Docker, starts services, prompts for E2B/S3):
# One-command setup
just setup
# Or with specific profile
just setup crypto_agentManual Docker Start (Skip setup wizard):
just docker-up # Basic services (PostgreSQL + MinIO + API)
just docker-up-full # With MLflow observabilityRequired Environment Variables (auto-configured by just setup):
# LLM Provider (at least one required)
OPENROUTER_API_KEY=... # Recommended (single key for all models)
# OR
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GOOGLE_API_KEY=...
# Optional: Toolkit API keys
E2B_API_KEY=... # Code execution (prompted during setup)
EXA_API_KEY=... # Web search via MCP
COINGECKO_API_KEY=... # CoinGecko Pro API (crypto_agent profile)Docker Compose automatically handles PostgreSQL, MinIO, and service configuration.
For development/testing without Docker:
# Base installation (modules only, no API server)
pip install -e .
# With REST API support (FastAPI + Uvicorn)
pip install -e ".[api]"Note: E2B code execution is included in base dependencies. For local API usage, you'll need to configure PostgreSQL manually (Docker handles this automatically).
Recommendation: Use Docker deployment for production features (persistence, API, observability). Local installation is suitable for module development and testing only.
The following example mirrors a typical orchestration loop. It uses three different providers to showcase how easily each module can work with distinct models and strategies.
import dspy
from roma_dspy import Aggregator, Atomizer, Executor, Planner, Verifier, SubTask
from roma_dspy.types import TaskType
# Optional tool that the Executor may call
def get_weather(city: str) -> str:
"""Return a canned weather report for the city."""
return f"The weather in {city} is sunny."
# Executor geared toward ReAct with a Fireworks model
executor_lm = dspy.LM(
"fireworks_ai/accounts/fireworks/models/kimi-k2-instruct-0905",
temperature=0.7,
cache=True,
)
executor = Executor(
lm=executor_lm,
prediction_strategy="react",
tools=[get_weather],
context_defaults={"track_usage": True},
)
# Atomizer decides when to branch into planning
atomizer = Atomizer(
lm=dspy.LM("openrouter/google/gemini-2.5-flash", temperature=0.6, cache=False),
prediction_strategy="cot",
context_defaults={"track_usage": True},
)
# Planner produces executable subtasks for non-atomic goals
planner = Planner(
lm=dspy.LM("openrouter/openai/gpt-4o-mini", temperature=0.85, cache=True),
prediction_strategy="cot",
context_defaults={"track_usage": True},
)
aggregator = Aggregator(
lm=dspy.LM("openrouter/openai/gpt-4o-mini", temperature=0.65),
prediction_strategy="cot",
)
verifier = Verifier(
lm=dspy.LM("openrouter/openai/gpt-4o-mini", temperature=0.0),
)
def run_pipeline(goal: str) -> str:
atomized = atomizer.forward(goal)
if atomized.is_atomic or atomized.node_type.is_execute:
execution = executor.forward(goal)
candidate = execution.output
else:
plan = planner.forward(goal)
results = []
for idx, subtask in enumerate(plan.subtasks, start=1):
execution = executor.forward(subtask.goal)
results.append(
SubTask(
goal=subtask.goal,
task_type=subtask.task_type,
dependencies=subtask.dependencies,
)
)
aggregated = aggregator.forward(goal, results)
candidate = aggregated.synthesized_result
verdict = verifier.forward(goal, candidate)
if verdict.verdict:
return candidate
return f"Verifier flagged the output: {verdict.feedback or 'no feedback returned'}"
print(run_pipeline("Plan a weekend in Barcelona and include a packing list."))Highlights:
- Different modules can run on different LMs and temperatures.
- Tools are provided either at construction or per-call.
context_defaultsensures each.forward()call enters a properdspy.context()with the module's LM.
ROMA-DSPy uses OmegaConf for layered configuration with Pydantic validation, and provides execution-scoped storage for complete task isolation.
from roma_dspy.config import load_config
# Load with profile and overrides
config = load_config(
profile="crypto_agent",
overrides=["agents.executor.llm.temperature=0.3"]
)Available Profiles: general, crypto_agent (list with just list-profiles)
See: Configuration Guide for complete documentation on profiles, agent configuration, LLM settings, toolkit configuration, and task-aware agent mapping.
Storage is automatic and execution-scoped - each task gets an isolated directory. Large toolkit responses (>100KB) are automatically stored as Parquet files.
from roma_dspy.core.engine.solve import solve
# Storage created automatically at: {base_path}/executions/{execution_id}/
result = solve("Analyze blockchain transactions")Features: Execution isolation, S3-compatible, automatic Parquet storage, Docker-managed
See: Deployment Guide for production storage configuration including S3 integration.
ROMA-DSPy includes 9 built-in toolkits that extend agent capabilities:
Core: FileToolkit, CalculatorToolkit, E2BToolkit (code execution) Crypto: CoinGeckoToolkit, BinanceToolkit, DefiLlamaToolkit, ArkhamToolkit Search: SerperToolkit (web search) Universal: MCPToolkit (connect to any MCP server)
agents:
executor:
toolkits:
- class_name: "FileToolkit"
enabled: true
- class_name: "E2BToolkit"
enabled: true
toolkit_config:
timeout: 600See: Toolkits Reference for complete toolkit documentation including all tools, configuration options, MCP integration, and custom toolkit development.
ROMA-DSPy provides both a REST API and CLI for production use.
FastAPI server with interactive documentation:
# Starts automatically with Docker
just docker-up
# API Documentation: http://localhost:8000/docs
# Health check: http://localhost:8000/healthEndpoints: Execution management, checkpoints, visualization, metrics
# Local task execution
roma-dspy solve "Your task" --profile general
# Server management
roma-dspy server start
roma-dspy server health
# Execution management
roma-dspy exec create "Task"
roma-dspy exec status <id> --watch
# Interactive TUI visualization (requires MLflow for best results)
just viz <execution_id>
# Full help
roma-dspy --helpSee: API documentation at /docs endpoint for complete OpenAPI specification and interactive testing.
All modules inherit from BaseModule, located at roma_dspy/core/modules/base_module.py. It standardizes:
- signature binding via DSPy prediction strategies,
- LM instantiation and context management,
- tool normalization and merging,
- sync/async entrypoints with safe keyword filtering.
When you instantiate a module, you can either provide an existing dspy.LM or let the module build one from a provider string (model) and optional keyword arguments (model_config).
from roma_dspy import Executor
executor = Executor(
model="openrouter/openai/gpt-4o-mini",
model_config={"temperature": 0.5, "cache": True},
)Internally, BaseModule ensures that every .forward() call wraps the predictor invocation in:
with dspy.context(lm=self._lm, **context_defaults):
...You can inspect the effective LM configuration via get_model_config() to confirm provider, cache settings, or sanitized kwargs.
Tools can be supplied as a list, tuple, or mapping of callables accepted by DSPyβs ReAct/CodeAct strategies.
executor = Executor(tools=[get_weather])
executor.forward("What is the weather in Amman?", tools=[another_function])BaseModule automatically deduplicates tools based on object identity and merges constructor defaults with per-call overrides.
ROMA exposes DSPy's strategies through the PredictionStrategy enum (roma_dspy/types/prediction_strategy.py). Use either the enum or a case-insensitive string alias:
from roma_dspy.types import PredictionStrategy
planner = Planner(prediction_strategy=PredictionStrategy.CHAIN_OF_THOUGHT)
executor = Executor(prediction_strategy="react")Available options include Predict, ChainOfThought, ReAct, CodeAct, BestOfN, Refine, Parallel, majority, and more. Strategies that require tools (ReAct, CodeAct) automatically receive any tools you pass to the module.
Every module offers an aforward() method. When the underlying DSPy predictor supports async (acall/aforward), ROMA dispatches asynchronously; otherwise, it gracefully falls back to the sync implementation while preserving awaitability.
result = await executor.aforward("Download the latest sales report")Location: roma_dspy/core/modules/atomizer.py
Purpose: Decide whether a goal is atomic or needs planning.
Constructor:
Atomizer(
prediction_strategy: Union[PredictionStrategy, str] = "ChainOfThought",
*,
lm: Optional[dspy.LM] = None,
model: Optional[str] = None,
model_config: Optional[Mapping[str, Any]] = None,
tools: Optional[Sequence|Mapping] = None,
**strategy_kwargs,
)Inputs (AtomizerSignature):
goal: str
Outputs (AtomizerResponse):
is_atomic: boolβ whether the task can run directly.node_type: NodeTypeβPLANorEXECUTEhint for downstream routing.
Usage:
atomized = atomizer.forward("Curate a 5-day Tokyo itinerary with restaurant reservations")
if atomized.is_atomic:
... # send directly to Executor
else:
... # hand off to PlannerThe Atomizer is strategy-agnostic but typically uses ChainOfThought or Predict. You can pass hints (e.g., max_tokens) via call_params:
atomizer.forward(
"Summarize this PDF",
call_params={"max_tokens": 200},
)Location: roma_dspy/core/modules/planner.py
Purpose: Break a goal into ordered subtasks with optional dependency graph.
Constructor: identical pattern as the Atomizer.
Inputs (PlannerSignature):
goal: str
Outputs (PlannerResult):
subtasks: List[SubTask]β each hasgoal,task_type, anddependencies.dependencies_graph: Optional[Dict[str, List[str]]]β explicit adjacency mapping when returned by the LM.
Usage:
plan = planner.forward("Launch a B2B webinar in 6 weeks")
for subtask in plan.subtasks:
print(subtask.goal, subtask.task_type)SubTask.task_type is a TaskType enum that follows the ROMA MECE framework (Retrieve, Write, Think, Code Interpret, Image Generation).
Location: roma_dspy/core/modules/executor.py
Purpose: Resolve atomic goals, optionally calling tools/functions through DSPy's ReAct, CodeAct, or similar strategies.
Constructor: same pattern; the most common strategies are ReAct, CodeAct, or ChainOfThought.
Inputs (ExecutorSignature):
goal: str
Outputs (ExecutorResult):
output: str | Anysources: Optional[List[str]]β provenance or citations.
Usage:
execution = executor.forward(
"Compile a packing list for a 3-day ski trip",
config={"temperature": 0.4}, # per-call LM override
)
print(execution.output)To expose tools only for certain calls:
execution = executor.forward(
"What is the weather in Paris?",
tools=[get_weather],
)Location: roma_dspy/core/modules/aggregator.py
Purpose: Combine multiple subtask results into a final narrative or decision.
Constructor: identical pattern.
Inputs (AggregatorResult signature):
original_goal: strsubtasks_results: List[SubTask]β usually the plannerβs proposals augmented with execution outputs.
Outputs (AggregatorResult base model):
synthesized_result: str
Usage:
aggregated = aggregator.forward(
original_goal="Plan a data migration",
subtasks_results=[
SubTask(goal="Inventory current databases", task_type=TaskType.RETRIEVE),
SubTask(goal="Draft migration timeline", task_type=TaskType.WRITE),
],
)
print(aggregated.synthesized_result)Because it inherits BaseModule, you can still attach tools (e.g., a knowledge-base retrieval function) if your aggregation strategy requires external calls.
Location: roma_dspy/core/modules/verifier.py
Purpose: Validate that the synthesized output satisfies the original goal.
Inputs (VerifierSignature):
goal: strcandidate_output: str
Outputs:
verdict: boolfeedback: Optional[str]
Usage:
verdict = verifier.forward(
goal="Draft a GDPR-compliant privacy policy",
candidate_output=aggregated.synthesized_result,
)
if not verdict.verdict:
print("Needs revision:", verdict.feedback)Use replace_lm() to reuse the same module with a different LM (useful for A/B testing or fallbacks).
fast_executor = executor.replace_lm(dspy.LM("openrouter/anthropic/claude-3-haiku"))You can alter LM behavior or provide extra parameters without rebuilding the module.
executor.forward(
"Summarize the meeting notes",
config={"temperature": 0.1, "max_tokens": 300},
context={"stop": ["Observation:"]},
)call_params (or keyword arguments) are filtered to match the DSPy predictorβs accepted kwargs, preventing accidental errors.
If you want deterministic tool routing, you can set a dummy LM (or a very low-temperature model) and pass pure Python callables.
from roma_dspy import Executor
executor = Executor(
prediction_strategy="code_act",
lm=dspy.LM("openrouter/openai/gpt-4o-mini", temperature=0.0),
tools={"get_weather": get_weather, "lookup_user": lookup_user},
)ROMA will ensure both constructor and per-call tools are available to the strategy.
# Run all tests
just test
# Run specific tests
pytest tests/unit/ -v
pytest tests/integration/ -vSee: justfile for all available test commands.
ValueError: Either provide an existing lmβ supplylm=ormodel=when constructing the module.Invalid prediction strategyβ check spelling; strings are case-insensitive but must match a known alias.- Caching β pass
cache=Trueon your LM or set it inmodel_configto reutilize previous completions. - Async contexts β when mixing sync and async calls, ensure your event loop is running (e.g., use
asyncio.run). - Tool duplicates β tools are deduplicated by identity; create distinct functions if you need variations.
- DSPy: Stanford's declarative framework for prompting, planning, and tool integration.
- Prediction Strategy: The DSPy class/function that powers reasoning (CoT, ReAct, etc.).
- SubTask: Pydantic model describing a decomposed unit of work (
goal,task_type,dependencies). - NodeType: Whether the Atomizer chose to
PLANorEXECUTE. - TaskType: MECE classification for subtasks (
RETRIEVE,WRITE,THINK,CODE_INTERPRET,IMAGE_GENERATION). - Context Defaults: Keyword arguments provided to
dspy.context(...)on every call.
- FileStorage: Execution-scoped storage manager providing isolated directories per task execution.
- DataStorage: Automatic Parquet storage system for large toolkit responses (threshold-based).
- Execution ID: Unique identifier for each task execution, used for storage isolation.
- Base Path: Root directory for all storage operations (local path or S3 bucket).
- Profile: Named configuration preset (e.g.,
general,crypto_agent). - Configuration Override: Runtime value that supersedes profile/default settings.
- BaseToolkit: Abstract base class for all toolkits providing storage integration and tool registration.
- REQUIRES_FILE_STORAGE: Metadata flag indicating a toolkit requires FileStorage (e.g., FileToolkit).
- Toolkit Config: Toolkit-specific settings like API keys, timeouts, and thresholds.
- Tool Selection: Include/exclude lists to filter which tools from a toolkit are available.
- Storage Threshold: Size limit (KB) above which responses are stored in Parquet format.
- Execution-Scoped Isolation: Pattern where each execution gets unique storage directory.
- Parquet Integration: Automatic columnar storage for large structured data.
- S3 Compatibility: Ability to use S3-compatible storage via Docker volume mounts.
- Tool Registration: Automatic discovery and registration of toolkit methods as callable tools.
Happy building! If you extend or customize a module, keep the signatures aligned so your higher-level orchestration remains stable.
Additional Resources:
- Quick Start Guide - Get started in under 10 minutes
- Configuration Guide - Complete configuration reference
- Toolkits Reference - All built-in and custom toolkits
- Deployment Guide - Production deployment with Docker
- E2B Setup - Code execution toolkit setup
- Observability - MLflow tracking and monitoring
- Configuration System - Configuration profiles and examples
We evaluate our simple implementation of a search system using ROMA, called ROMA-Search across three benchmarks: SEAL-0, FRAMES, and SimpleQA.
Below are the performance graphs for each benchmark.
SealQA is a new challenging benchmark for evaluating Search-Augmented Language models on fact-seeking questions where web search yields conflicting, noisy, or unhelpful results.
View full results
A comprehensive evaluation dataset designed to test the capabilities of Retrieval-Augmented Generation (RAG) systems across factuality, retrieval accuracy, and reasoning.
View full results
Factuality benchmark that measures the ability for language models to answer short, fact-seeking questions.
While ROMA introduces a practical, open-source framework for hierarchical task execution, it is directly built upon two foundational research contributions introduced in WriteHERE:
-
Heterogeneous Recursive Planning β The overall architecture of ROMA follows the framework first introduced in prior work on heterogeneous recursive planning, where complex tasks are recursively decomposed into a graph of subtasks, each assigned a distinct cognitive type.
-
Type Specification in Decomposition β ROMAβs βThree Universal Operationsβ (THINK π€, WRITE βοΈ, SEARCH π) generalize the type specification in decomposition hypothesis, which identified reasoning, composition, and retrieval as the three fundamental cognitive types.
These contributions are described in detail in the WriteHERE repository and paper. By explicitly adopting and extending this foundation, ROMA provides a generalizable scaffold, agent system, versatility, and extensibility that builds upon these insights and makes them usable for builders across domains.
This framework would not have been possible if it wasn't for these amazing open-source contributions!
- Inspired by the hierarchical planning approach described in "Beyond Outlining: Heterogeneous Recursive Planning" by Xiong et al.
- Pydantic - Data validation using Python type annotations
- DSPy) - Framework for programming AI agents
- E2B - Cloud runtime for AI agents
If you use the ROMA repo in your research, please cite:
@software{al_zubi_2025_17052592,
author = {Al-Zubi, Salah and
Nama, Baran and
Kaz, Arda and
Oh, Sewoong},
title = {SentientResearchAgent: A Hierarchical AI Agent
Framework for Research and Analysis
},
month = sep,
year = 2025,
publisher = {Zenodo},
version = {ROMA},
doi = {10.5281/zenodo.17052592},
url = {https://doi.org/10.5281/zenodo.17052592},
swhid = {swh:1:dir:69cd1552103e0333dd0c39fc4f53cb03196017ce
;origin=https://doi.org/10.5281/zenodo.17052591;vi
sit=swh:1:snp:f50bf99634f9876adb80c027361aec9dff97
3433;anchor=swh:1:rel:afa7caa843ce1279f5b4b29b5d3d
5e3fe85edc95;path=salzubi401-ROMA-b31c382
},
}This project is licensed under the Apache 2.0 License - see the LICENSE file for details.




