Model Context Protocol server for Gnosis Chain analytics on top of ClickHouse and dbt.
Cerebro MCP exposes query, schema, visualization, report, and long-horizon research tools to MCP hosts such as Claude Desktop, Claude Code, and VS Code. It is designed for three distinct usage patterns:
- Quick exploratory analysis and factual lookups
- Interactive chart and report generation
- Durable, phase-driven research projects
This README is intentionally implementation-oriented. It describes what the server actually does today, how the MCP flows work, and which tools to use for each job.
Cerebro MCP is a FastMCP server with:
- ClickHouse access for read-only SQL
- dbt manifest search and lineage lookup
- interactive chart and report generation
- reasoning/tracing utilities
- durable research project storage
- parameterized custom query tools loaded from YAML config
- AI dashboard tab scaffolding for metrics-dashboard
- number verification tool that checks arithmetic and cross-references before reporting
- MCP prompts and resources that guide clients, but do not run automatically on their own
- security audit layer with tool risk classification, suspicious-call detection, and append-only JSONL logging
- six React + ECharts mini-apps (Report, Token Explorer, Metric Lab, Portfolio, Yield Opportunities, Graph Explorer) served as single-file HTML via
ui://cerebro/<app>resources — seedocs/MINI_APPS.mdfor the full tour - mini-app infrastructure for interactive in-chat views (Metric Lab, Token Explorer)
Important distinction:
- Tools execute logic on the server
- Prompts return guidance text to the client
- Resources expose static reference material
- The server does not run an internal autonomous LLM loop
If a client wants to use personas or peer review, it must explicitly call the prompt or tool flow that supports it.
Cerebro ships with 23 agent personas loadable via get_agent_persona(role). They are prompt-layer guidance — the LLM adopts a persona's rules for the duration of a task, then can switch. Personas organize into three tiers:
| Role | Purpose |
|---|---|
cerebro_dispatcher |
Intent-triage and gated routing. Start here for any non-trivial request — classifies the ask, runs preflight_analytics_request, picks the specialist chain, and emits a mandatory dispatch manifest. Full rules: docs/cerebro-docs — MCP / Dispatcher. |
| Role | Purpose |
|---|---|
analytics_reporter |
Data Science Lead — standard analysis + report SOP (preflight → discovery → EDA → charts → report). |
reality_checker |
QA gate — SQL safety audit, data validation, chart-type fit, narrative consistency before delivery. |
ui_designer |
Chart selection, ECharts styling, report markdown layout. |
gnosis_research_analyst |
Semantic-first multi-phase research — wraps start_research_project through publish_research_report. |
storyteller_orchestrator |
Coordinates the 7-persona Storytelling-with-Data pipeline (see the Storyteller Workflow section). |
mmm_analyst |
Marketing Mix Modeling SOP — spine-fill → multicollinearity → baseline → adstock/Hill → contribution decomposition. |
mmm_causal_reviewer |
DAG gate before any MMM generate_report: chronological, non-inclusion, identifiability checks per Hakuhodo Guidebook Ch.3. |
mmm_simulator |
Budget reallocation and marginal-ROI — bounded at ±30%/period. |
| Role | Topic keywords |
|---|---|
growth_analyst |
DAU / WAU / MAU, retention cohorts, funnel analysis, new-vs-returning |
forecasting_analyst |
time-series decomposition (seriesDecomposeSTL), seasonality, forecasts with confidence bands |
defi_analyst |
TVL, utilization, lending protocols (Aave/Agave/Spark), DEXs (UniV3/Balancer/CoW/Swapr), LP/IL |
tokenomics_analyst |
staking APY, GNO supply, validator concentration (HHI / Gini / Nakamoto) |
network_health_analyst |
client diversity, p2p, geographic distribution, decentralization thresholds |
bridge_security_analyst |
bridge flow anomaly detection, directional imbalance, bridge-efficiency comparison |
marketing_analyst |
external-audience framing, investor updates, grant narratives |
esg_analyst |
validator energy, carbon intensity, GHG Scope 2, efficiency trends |
statistical_reviewer |
methodology challenge, sample-size review, p-hacking / multiple-testing correction, CI construction |
storyteller_context, storyteller_narrative, storyteller_visual_designer, storyteller_writer, storyteller_critic, storyteller_accessibility |
Sub-phases of the storyteller pipeline — invoked by storyteller_orchestrator |
Gating: when the dispatcher emits a manifest listing required specialists, the session treats it as a binding execution contract. The MMM flow hard-blocks generate_report until mmm_causal_reviewer returns VERDICT: PASS. The storyteller flow hard-blocks final handoff until every clarity check passes. See docs/cerebro-docs — MCP / Agents for the full agent catalog, gate semantics, and routing table.
At runtime the system looks like this:
- Your MCP host connects over
stdioorSSE - The host asks the model to solve a task
- The model chooses tools, prompts, and resources from this server
- The server executes ClickHouse/dbt/report/research logic and returns structured data
- For report tools, MCP App compatible clients can render the returned
structuredContentinline
Core runtime components:
clickhouse_client.py: shared execution pipeline, limit enforcement, JSON-safe normalization, schema cachetools/query.py: sync SQL executiontools/query_async.py: async query jobs with paginated result previewstools/schema.py: table listing and schema inspectiontools/dbt.py: model search and lineage/context lookuptools/visualization.py: chart registry, report rendering, report persistencetools/research.py: durable research projects, evidence, verification, peer review, publicationtools/mini_apps.py: mini-app view infrastructure, app-only tool visibility filtertools/metric_lab.py: interactive Metric Lab viewstools/token_explorer.py: interactive Token Explorer viewssecurity.py: tool risk classification, suspicious-call flagging, append-only JSONL security audit logobservability.py: Prometheus metrics, structured JSON logging, security counters
Artifacts created by the server:
- saved reports on disk
- saved query SQL snippets
- async query result pages
- research projects and evidence snapshots
- reasoning traces
- security audit logs (JSONL, daily rotation)
| Workflow | Use it for | Core tools | Output |
|---|---|---|---|
| Semantic analytics | governed metrics, multi-hop dimension reachability, explainable metric SQL, research evidence with provenance | discover_metrics, get_metric_details, explain_metric_query, query_metrics |
semantic query result, compiled SQL, retry traces, semantic provenance |
| Quick exploration | answering a direct question, checking a metric, testing a hypothesis quickly | discover_models, describe_table, execute_query, explain_query, get_sample_data |
markdown summary plus structured query payload |
| Report workflow | visual analysis, KPI packs, weekly or monthly summaries, chart-heavy deliverables | discover_models, describe_table, execute_query, generate_charts, generate_report |
interactive report artifact plus saved HTML |
| Research workflow | multi-step investigations that need memory, evidence, review, and publication | start_research_project, phase tools, evidence tools, verification, peer review, publish_research_report |
durable research project plus report artifact |
| Storyteller workflow | decision-oriented narratives, executive briefs, stakeholder memos, recommendation artifacts where an audience must be moved to action | storyteller_start_session, storyteller_record_*, storyteller_run_clarity_checks, storyteller_generate_story_report |
gated, narrative-first report artifact with action titles, focal-point design, and adversarial clarity review |
| MMM workflow | sector contribution attribution, ROI across incentive programs, "which emissions drove TVL", budget reallocation | get_agent_persona("mmm_analyst") → get_agent_persona("mmm_causal_reviewer") (gate) → get_agent_persona("mmm_simulator") (optional) → generate_charts → generate_report |
contribution stacked-area, spend-vs-effectiveness share, response-curve scatter, adstock decay, causal-review table |
| Custom query tools | common domain questions with known parameters | get_validator_balance_history, get_token_transfers_for_address, etc. |
parameterized query result |
| Dashboard scaffolding | creating new dashboard tabs from semantic metrics | discover_dashboard_metrics, scaffold_dashboard_tab |
JS query files + YAML config |
| Number verification | any computed numbers before reporting to user | verify_numbers |
PASS/MISMATCH verdict |
Use quick exploration when you do not need charts or durable state. Use reports when the deliverable is a visual artifact. Use research when the work spans multiple phases or needs explicit evidence and review.
Use this when the user asks something like:
- "How many transactions were there yesterday?"
- "What tables cover validator activity?"
- "Compare bridge inflows over the last 7 days"
Recommended sequence:
- Find the right data source
- Prefer
discover_models(query, detail_top_n=5)as the first step - Use
search_models+get_model_detailsonly when you want broader manual triage
- Prefer
- Verify the actual schema
- Call
describe_tablebefore writing SQL
- Call
- Run a bounded query
- Use
execute_query - Use
explain_queryfirst if you are unsure about the SQL shape
- Use
- If the query is large or slow
- Use
start_query - Poll with
get_query_results
- Use
Minimal pattern:
discover_models("bridge volume daily", detail_top_n=5)
describe_table("api_bridges_volume_daily", database="dbt")
execute_query("SELECT dt, volume_usd FROM dbt.api_bridges_volume_daily WHERE dt >= today() - 7 ORDER BY dt", database="dbt", max_rows=30)
What to expect from execute_query:
- structured payload with
columns,rows,row_count,rows_returned,warnings summary_markdownfor a human-readable table- automatic row and payload truncation to protect MCP/LLM context
When to use async instead:
- expected result set is large
- query may take longer than a normal chat turn
- you want paginated result retrieval
Async flow:
start_query(...)
get_query_results(query_id)
get_query_results(query_id, page_token="...")
Important behavior:
- async jobs are in-process, not permanent
- restarting the server loses pending/finished async job state
- completed async jobs are cleaned up after a short retention window
Use this when the user is really asking for a governed metric question, for example:
- "Show transaction count by sector over the last 30 days"
- "Which dimensions can I slice validator rewards by?"
- "Explain how this metric query was planned"
Recommended sequence:
- Discover the metric surface
- call
discover_metrics(query, module=..., limit=...)
- call
- Inspect the resolved metric
- call
get_metric_details(metric_name)
- call
- Explain before execution when the path is non-trivial
- call
explain_metric_query(...)
- call
- Execute through the semantic layer
- call
query_metrics(...)
- call
- If semantic execution is unavailable or unsupported
- surface the returned fallback reason
- move to raw SQL with
describe_table,get_model_details,execute_query, andget_clickhouse_query_rules
Important behavior:
- semantic execution is deployment-gated by
SEMANTIC_ENABLED - when semantic is enabled but artifacts are stale or unavailable, semantic tools stay registered but execution returns graceful unavailable guidance
query_metricsperforms one deterministic server-side repair retry for known-safe ClickHouse errors- raw
execute_queryremains agent-guided rather than auto-healed by the server
Use this when the user asks common domain questions with known parameters:
- "What is validator 12345's balance history?"
- "Show me GNO transfers for 0x..."
- "What are the bridge flows for USDC?"
These are pre-built, peer-reviewed SQL templates. The LLM passes typed parameters; ClickHouse handles binding natively. Zero SQL hallucination risk.
Recommended sequence:
- Check available tools
- Call
list_custom_toolsto see available parameterized tools
- Call
- Call the matching tool directly
- e.g.
get_validator_balance_history(validator_index=12345, start_date="2025-01-01")
- e.g.
Example tools:
get_validator_balance_history(validator_index=12345, start_date="2025-01-01")
get_token_transfers_for_address(address="0x...", token_symbol="GNO")
get_bridge_flows_by_token(token="USDC", start_date="2025-06-01")
get_gpay_wallet_activity(wallet_address="0x...")
Custom tools use ClickHouse native parameter binding ({param:Type}) and are immune to SQL injection. When no custom tool exists, fall back to the raw SQL workflow.
The verify_numbers tool catches computation errors before numbers reach the user. The agent must show its work — the formula it used and the component values — so the tool can independently verify the math.
When to call: before reporting any computed number (sums, nets, percentages, totals).
How it works:
- Agent computes numbers from query results
- Agent calls
verify_numberswith structured claims:label: what the number representsvalue: the computed numberformula: the arithmetic used (e.g., "received - sent")components: the named values (e.g., {"received": 9352.5, "sent": 9002.9})check_query: optional SQL against an independent model for cross-reference
- Tool verifies: does received - sent actually equal the claimed value?
- Tool runs check_query if provided and compares
- Returns PASS or MISMATCH with specific error details
Example catching a real bug:
The agent computed transfers and claimed net inflow = 1360 GNO:
verify_numbers('[{"label": "net GNO inflow", "value": 1360.5, "formula": "received - sent", "components": {"received": 9352.5, "sent": 9002.9}}]')
Result: ARITHMETIC FAIL — 9352.5 - 9002.9 = 349.6, NOT 1360.5 (diff: 289.2%). The agent must correct to 349.6 before presenting to the user.
Use this when the user wants:
- a weekly or monthly summary
- charts, trends, dashboards, or visual comparisons
- a report artifact that can be reopened or exported
Recommended sequence:
- Discover relevant models
- Prefer
discover_models(query, detail_top_n=5)
- Prefer
- Verify a real table with
describe_table - Run exploratory queries with
execute_query - Build charts
- Preferred:
generate_charts - Secondary:
generate_chart - For a one-off plot with no gates:
quick_chart
- Preferred:
- Write markdown with chart placeholders
- Call
generate_report - Optionally use
open_report,list_reports, orexport_report
Preferred report chart flow:
generate_charts([
{...},
{...},
{...}
])
generate_report(title="Gnosis Chain Weekly Report", content_markdown="...")
| Tool | Use when | Notes |
|---|---|---|
quick_chart |
one-off exploratory plot | bypasses report/discovery gates |
generate_chart |
you need one chart at a time | gated; useful for a single additional chart |
generate_charts |
you are building a real report | preferred; creates multiple charts in one call |
The report renderer understands:
## Headingfor major sections{{chart:chart_id}}to place a chart{{grid:N}} ... {{/grid}}to group charts into a grid row
Example:
## Executive Summary
{{grid:3}}
{{chart:chart_1}}
{{chart:chart_2}}
{{chart:chart_3}}
{{/grid}}
Key takeaways from the week.
## Activity Trend
{{chart:chart_4}}
## Breakdown
{{grid:2}}
{{chart:chart_5}}
{{chart:chart_6}}
{{/grid}}Hard gates for chart creation:
generate_chartandgenerate_chartsrequire:- at least one discovery call (
search_modelsordiscover_models) - at least
MIN_MODELS_DETAILEDmodel detail explorations - at least
MIN_TABLES_VERIFIEDtable schema checks
- at least one discovery call (
Hard gates for report creation:
- at least
MIN_CHARTS_FOR_REPORTregistered charts - at least one trend or breakdown chart
- at least
MIN_EXPLORATORY_QUERIESprior query executions - at least one dimensional breakdown
- at least one relational analysis signal:
- a scatter or heatmap chart, or
- a correlation query
Soft warnings only:
- lack of statistical queries
- lack of explicit correlation/regression queries
- shallow exploration before charting
- no scatter chart in a larger report
Layout enforcement:
- if the report contains two or more
numberDisplaycharts and no{{grid:N}}block, report generation is rejected - other layout guidance is recommended through prompts and personas, but not hard-blocked by the renderer
This matters because older documentation may imply stricter layout or persona enforcement than the server actually performs.
generate_report does three things:
- saves a standalone HTML report to disk
- returns
structuredContentfor MCP App capable clients - returns a text summary containing the report ID and reopen/export hints
Useful report tools after creation:
open_report(report_id)to reopen a saved reportlist_reports()to browse saved reportsexport_report(report_id)to get a file path or SSE download URL
Use this when the user wants:
- a deep-dive investigation
- a multi-turn, multi-phase analysis
- durable memory and evidence
- explicit verification and peer review
The research system is MCP-native and client-orchestrated. It is not an autonomous background scientist. The assistant must explicitly drive the project through its phases.
Research phases:
mappinghypothesisexecutionverificationpublication
Recommended sequence:
start_research_project(...)
plan_research_phase(..., phase="mapping")
execute_research_phase(..., phase="mapping")
capture_schema_snapshot(...)
record_research_memory(...)
plan_research_phase(..., phase="hypothesis")
execute_research_phase(..., phase="hypothesis")
plan_research_phase(..., phase="execution")
execute_research_phase(..., phase="execution")
execute_query(..., research_project_id=..., persist_result=True)
attach_research_evidence(...)
record_research_finding(...)
verify_research_phase(...)
prepare_peer_review(...)
conduct_research_peer_review(...)
record_peer_review(...)
publish_research_report(...)
get_research_project(project_id) returns a compact summary only:
- current phase
- status
- evidence count
- memory count
- findings count
- peer review status
- artifact count
For detail, use the paginated retrieval tools:
get_research_memoryget_research_evidenceget_research_findings
Durable evidence kinds:
query_resultsemantic_query_resultschema_snapshotchartreport
Important:
save_queryis not evidencesave_querystores reusable SQL only- durable sync query evidence must come from
execute_query(..., persist_result=True)
When you run:
execute_query(
sql=...,
research_project_id="rp_...",
persist_result=True,
evidence_title="Daily bridge volume",
persist_max_rows=10000
)
the server does two separate things:
- returns a tool-budgeted preview payload to the model
- stores a larger durable result artifact on disk and returns
result_ref_id
You can then attach that result to the project:
attach_research_evidence(
project_id="rp_...",
kind="query_result",
ref_id="qry_...",
phase="execution"
)
Use record_research_memory for reusable domain facts:
- decimals and units
- caveats about tables
- join constraints
- protocol-specific notes
Use record_research_finding for project-specific claims:
- "Bridge inflows increased after proposal X"
- "Median validator count remained stable"
verify_research_phase is a structural check. It verifies things like:
- execution evidence exists
- statistical depth exists in the evidence set
- evidence references resolve cleanly
If verification fails, publication stays blocked.
Peer review is explicitly client-driven:
- call
prepare_peer_review - feed the packet into
conduct_research_peer_review(packet_json) - call
record_peer_review
The review result is structured and stored as PeerReviewResult.
publish_research_report reuses the normal report renderer but applies research-phase gating instead of report-mode session gating.
Use it after:
- verification passes
- peer review is recorded
- peer review decision is not
rejected
Use this when you need to create new dashboard tabs in the metrics-dashboard Vite application.
Recommended sequence:
- Discover available metrics
- Call
discover_dashboard_metrics(module="bridges")to browse api_* models
- Call
- Build a blueprint
- Construct a DashboardBlueprint JSON with tab config and query specs
- Preview changes
- Call
scaffold_dashboard_tab(blueprint_json)withdry_run: true
- Call
- Apply changes
- Call again with
dry_run: falseto write JS query files and merge YAML config
- Call again with
The scaffold tool generates JS query files in metrics-dashboard/src/queries/ and merges tab config into the appropriate YAML dashboard file. It is idempotent: re-running updates existing files rather than creating duplicates.
Important: The LLM never writes React/JSX directly. Only the scaffold tool generates UI code from structured JSON blueprints validated by Pydantic.
Use this when the user explicitly asks for a story, narrative, executive brief, decision memo, stakeholder pitch, or recommendation artifact — anything where an audience must be moved to action rather than just shown the data. Do not auto-upgrade a standard report request into a storyteller run; if the user asks for "a report" and the scope is ambiguous, ask which mode they want.
The storyteller is an opt-in, multi-agent pipeline grounded in Cole Nussbaumer Knaflic's Storytelling with Data (Wiley, 2015). It sits alongside the standard report workflow and leaves generate_report untouched. Standard mode remains the default for dashboards, KPI packs, trend reports, and ad-hoc analysis.
Standard report mode answers "show me the data." Storyteller mode answers "convince this specific audience to take this specific action." The two have different shapes:
- Standard mode pushes charts as fast as the discovery/EDA gates allow.
- Storyteller mode refuses to touch data until it has a named audience, a required action, and a one-sentence big idea with stakes.
If the user has not named a decision-maker and a concrete ask, the storyteller pauses and asks — guessing the audience is the single most common failure mode in decision communication.
The storyteller is structured as seven cooperating personas coordinated by an orchestrator. Each persona has a narrow contract, produces a specific artifact, and hands off to the next. Gates are enforced in code (storyteller_state.py), not by convention — skipping a gate raises RuntimeError.
┌────────────────────┐
│ Orchestrator │ Mode selection + gate enforcement (no content)
└─────────┬──────────┘
│
┌─────────▼──────────┐
│ Context Agent │ audience, required action, mechanism, tone
└─────────┬──────────┘ produces: ContextBrief
│
┌─────────▼──────────┐
│ Explorer │ runs through standard Cerebro tools
└─────────┬──────────┘ produces: candidate findings (feed, never shipped)
│
┌─────────▼──────────┐
│ Narrative Agent │ one-sentence big idea with stakes
└─────────┬──────────┘ produces: BigIdea, Storyboard (setup → tension → resolution)
│
┌─────────▼──────────┐
│ Visual Designer │ relationship-first chart choice, one focal point per scene
└─────────┬──────────┘ produces: VisualSpec per scene (action titles, de-emphasis)
│
┌─────────▼──────────┐
│ Writer │ action titles, annotations, prose, medium adaptation
└─────────┬──────────┘ produces: final_story markdown with chart placeholders
│
┌─────────▼──────────┐
│ Critic │ adversarial, cold-reader review (four clarity tests)
└─────────┬──────────┘ produces: ReviewReport (ready_for_handoff + blocking_issues)
│
┌─────────▼──────────┐
│ Accessibility & │ colorblind palette, contrast, language, tone match
│ Tone Agent │
└─────────┬──────────┘ produces: pass/fail verdict
│
┌─────────▼──────────┐
│ Handoff │ storyteller_generate_story_report
└────────────────────┘ renders via the existing generate_report pipeline
Each arrow is a hard gate. On critic failure, the pipeline loops back to the earliest phase implicated in the blocking issues (a failed reverse storyboard sends work back to Narrative, not Writer).
Why seven personas and not one monolith like analytics_reporter? The storyteller is an artifact-handoff pipeline — each phase produces a typed Pydantic artifact the next phase consumes as its entire input. Persona separation matches real information boundaries: the Critic must read the work cold, without the Writer's "make it pretty" bias. The analytics reporter is a state-accumulation pipeline where later phases need the full upstream history, so monolithic is correct there. Different shapes, different layouts.
1. storyteller_start_session
2. get_agent_persona("storyteller_orchestrator") and ("storyteller_context")
3. Collect audience, required action, mechanism, tone, background, biases
4. storyteller_record_context_brief(...) # Gate 1: context
5. Explore data with standard Cerebro tools (discover_models, execute_query, etc.)
6. get_agent_persona("storyteller_narrative")
7. storyteller_record_big_idea(sentence, stakes) # Gate 2: big idea
8. storyteller_record_storyboard(scenes, ...) # Gate 3: storyboard
9. get_agent_persona("storyteller_visual_designer")
10. storyteller_record_visual_spec(scene_index, ...) # one per scene
generate_charts([...]) # render via standard tool
storyteller_record_visual_spec(..., chart_id=...) # attach chart_id
11. get_agent_persona("storyteller_writer")
12. storyteller_record_final_story(title, content_markdown) # Gate 4: final story
13. get_agent_persona("storyteller_critic")
14. storyteller_run_clarity_checks(checks=[...]) # Gate 5: clarity review
# On failure: loop back to earliest failing phase
15. get_agent_persona("storyteller_accessibility")
16. storyteller_record_accessibility_pass(passed, notes) # Gate 6: accessibility
17. storyteller_generate_story_report() # renders via generate_report
Minimal pattern the calling LLM follows:
storyteller_start_session()
storyteller_record_context_brief(
audience="Q2 budget committee",
required_action="Approve $250k to continue the summer pilot next year",
mechanism="memo",
tone="recommendation"
)
# ... normal Cerebro discovery, EDA, correlation queries ...
storyteller_record_big_idea(
sentence="The pilot lifted perception of science by 28 points, so the committee should fund a full-year rollout."
)
storyteller_record_storyboard(
scenes=[
{"index": 0, "intent": "Set up the pilot's goal", "role": "setup"},
{"index": 1, "intent": "Show the pre-pilot baseline gap", "role": "tension"},
{"index": 2, "intent": "Show the 28-point lift", "role": "evidence"},
{"index": 3, "intent": "Ask for full-year funding", "role": "resolution"},
],
narrative_order="lead_with_ending"
)
# ... record one visual_spec per scene; render charts via generate_charts ...
storyteller_record_final_story(title, content_markdown)
storyteller_run_clarity_checks(checks=[...])
storyteller_record_accessibility_pass(passed=True)
storyteller_generate_story_report()
- Context gate. Vague audiences like "stakeholders" or "leadership" are rejected at the Pydantic layer. Required actions shorter than 10 characters are rejected. Mechanism must be one of
live_presentation,slide_deck_leave_behind,emailed_deck,memo,brief,dashboard_excerpt,script. - Big idea gate. Must be a complete declarative sentence with point of view and stakes. Labels ("Q3 revenue"), single words, and strings ending in a colon are rejected.
- Storyboard gate. Must contain at least one
tensionscene and at least oneresolutionscene. Flat "everything is fine" narratives are rejected. - Visual spec gate.
pie,donut,3d, anddual_axischart families are rejected. Action titles must be sentences, not labels. Every scene index must match a scene in the recorded storyboard. - Clarity review gate.
ready_for_handoffisTrueonly if every clarity check passes. Failures drop the pipeline back to the earliest failing phase, tagged via theblocking_issueslist. - Accessibility gate. Hard failures (colorblind-hostile encoding, unreadable contrast) block handoff. Soft failures warn but do not block.
All gates are code, not convention. The relevant modules are:
src/cerebro_mcp/storyteller_models.py— Pydantic contracts and validatorssrc/cerebro_mcp/storyteller_state.py— phase cursor and gate enforcementsrc/cerebro_mcp/tools/storyteller.py— 11 MCP tools
The storyteller does not replace the standard report pipeline. Both modes coexist:
- Standard mode (default):
discover_models→execute_query→generate_charts→generate_report. Unchanged. - Storyteller mode (opt-in): wraps the standard pipeline with gated persona handoffs.
storyteller_generate_story_reportrenders the final artifact through the samecreate_report_artifactused bygenerate_report, but bypasses the standard quality gate because the storyteller has its own.
The standard session state (search_models_count, explored_models, chart registry) is preserved across a storyteller run so users can continue exploring after the story is rendered.
Do not auto-upgrade. If the user asks "give me a report on bridge activity", ask whether they want a standard report or a decision-oriented story.
Use this when the user asks about contribution attribution, ROI across incentive programs, budget optimisation, or "which emissions / rewards actually drove TVL / volume / users". MMM (Marketing Mix Modeling) adapts the Hakuhodo / Google framework to on-chain incentives: media = token emissions / LM rewards / validator APR, KPI = TVL / DEX volume / DAU / tx count.
MMM is a gated workflow. generate_report is blocked until the mmm_causal_reviewer returns VERDICT: PASS. The gate catches the most common MMM failure mode — undetected confounding (e.g. user-growth driving both incentives and KPI simultaneously) that would otherwise under-attribute the real drivers by an order of magnitude.
1. get_agent_persona("mmm_analyst")
- Runs the full SOP: spine-fill → multicollinearity → baseline → adstock → concave + Hill fit → decompose
- SQL toolkit is in the persona (geometric adstock, Hill grid search, bootstrap CIs, etc.)
- Requires ≥60 weekly rows or downgrades output to "directional only" with a banner
2. Synthesize mmm_analyst's output into a markdown DAG table (nodes, edges, confounder flags)
3. get_agent_persona("mmm_causal_reviewer")
- Pass the DAG table verbatim as the next user message
- Reviewer runs three checks: chronological, non-inclusion, identifiability
- On BLOCK: apply the prescribed fix (intervention / segmentation / front-door variable), re-submit
4. On PASS verdict:
- generate_charts (batch): contribution stacked-area, spend-vs-effectiveness share,
response-curve scatter per media, adstock decay, plus the causal-review markdown table
- generate_report
5. Optional prescription step (if user asks "what should we do next?"):
- get_agent_persona("mmm_simulator")
- Pass fitted (β, r, λ, current_spend, baseline_kpi) per media
- Simulator caps any shift at ±30%/period and returns marginal-ROI + reallocation charts
The review is mandatory because a single open backdoor can flip a coefficient's sign (Hakuhodo Guidebook p.35–38) or under-attribute a driver by ~9.8× (Guidebook p.116 TV-under-attribution simulation). Common on-chain failures the reviewer flags:
- Inverse causation: using pay-for-performance spend as a cause of conversions — the arrow runs the other way
- Computed-from-outcome loops: using validator APR as a cause of deposits when APR is derived from deposit volume
- Co-launched confounding: two incentive programs that started the same week have correlated series — the model cannot separate them without intervention, segmentation, or a front-door intermediate (e.g. unique-wallet count)
Prescribed fixes, in priority order: (1) stagger / dark-period intervention, (2) audience or protocol segmentation, (3) front-door variable, (4) future dark-period request.
On top of the standard generate_report gates, an MMM report must include:
- Contribution stacked-area over time (series_field = media)
- Spend vs. effectiveness share (grouped bar)
- Response curve per media (scatter + fitted line)
- Adstock decay per media showing λ
- Causal-review table (markdown, from
mmm_causal_reviewer)
See docs/cerebro-docs — MCP / MMM for the full SQL toolkit, worked example on real Gnosis App data, and the 4 SQL bugs surfaced by the live smoke test (with patched snippets).
| Tool | Use when | Notes |
|---|---|---|
discover_models |
best first step for analysis | search plus top model expansion in one call |
search_models |
broader search/triage | returns search list only |
get_model_details |
you already know the model name | full lineage, SQL, columns, dependencies |
list_tables |
browsing raw databases | paginated |
describe_table |
verifying exact ClickHouse schema | should happen before writing SQL |
get_sample_data |
checking real row shape quickly | lightweight shape inspection |
execute_query |
sync bounded SQL preview | structured output, truncation, optional research snapshotting |
explain_query |
checking query plan without running the full query | useful for troubleshooting |
start_query / get_query_results |
long-running or wider queries | paginated async result preview |
| Tool | Use when | Notes |
|---|---|---|
quick_chart |
one fast chart | bypasses workflow gates |
generate_chart |
one gated chart | okay for single additions |
generate_charts |
report chart batches | preferred for report workflows |
list_charts |
inspecting the current chart registry | charts are session-scoped |
generate_report |
final report assembly | consumes chart placeholders |
open_report |
reopen a saved report | works with saved report IDs |
list_reports |
browse disk reports | persistent across sessions if the directory persists |
export_report |
get a shareable file path or URL | SSE deployments can return HTTP download URLs |
| Tool | Use when | Notes |
|---|---|---|
start_research_project |
start a durable investigation | creates project state on disk |
get_research_project |
check project status | compact summary only |
plan_research_phase |
lock the plan for the current phase | phase must be current and pending |
execute_research_phase |
mark the phase complete and advance | deterministic state transition only |
attach_research_evidence |
link an artifact into the project | validates the ref first |
capture_schema_snapshot |
make schema inspection durable | good for mapping phase |
record_research_memory |
save reusable domain knowledge | backed by evidence refs |
record_research_finding |
save a project claim | backed by evidence refs |
get_research_memory |
browse stored memory | paginated |
get_research_evidence |
browse evidence | paginated, optional phase filter |
get_research_findings |
browse findings | paginated |
verify_research_phase |
run structural validation | publication gate |
prepare_peer_review |
build the review packet | client then applies review prompt |
record_peer_review |
persist the review result | publication remains blocked on rejection |
publish_research_report |
generate the final research artifact | requires verification and peer review |
| Tool | Use when | Notes |
|---|---|---|
list_databases |
inspect available ClickHouse databases | includes descriptions |
system_status |
debug server state | ClickHouse, manifest, docs, cache, tracing, transport |
resolve_address |
map names to addresses or labels | backed by dune_labels |
get_token_metadata |
verify token decimals and metadata | useful before interpreting amounts |
search_models_by_address |
find dbt models around a contract | contract-centric analysis |
search_docs / get_doc_chunk / get_docs_overview / get_docs_context / get_gnosis_chain_docs_context / get_gnosis_chain_doc_chunk |
query platform docs | supplementary context, curated overview, platform corpus, and Gnosis Chain corpus |
get_platform_constants |
chain constants and fixed references | useful before querying raw execution data |
save_query / run_saved_query / list_saved_queries |
reusable SQL snippets | not evidence storage |
| Tool | Use when | Notes |
|---|---|---|
discover_metrics |
first step for governed metric questions | scores canonical names, synonyms, docs keywords, and module hints |
get_metric_details |
you already know the metric name | shows root model, allowed dimensions, supported grains, docs, and reachability hints |
explain_metric_query |
you want the plan without execution | returns planner mode, selected/rejected paths, warnings, and compiled SQL |
query_metrics |
execute governed metrics with semantic provenance | uses the semantic planner, ClickHouse compiler, and one safe retry loop |
get_clickhouse_query_rules |
raw SQL fallback help | only registered when a valid vendored ClickHouse bundle is present |
| Tool | Use when | Notes |
|---|---|---|
list_custom_tools |
discovering available parameterized tools | shows all tools from custom_tools.yaml |
| (dynamically registered) | common domain questions with specific parameters | parameterized SQL, no raw query needed |
Custom tools are defined in custom_tools.yaml and registered at startup when CUSTOM_TOOLS_ENABLED=True. Each tool maps to a pre-built SQL template with typed parameters.
| Tool | Use when | Notes |
|---|---|---|
discover_dashboard_metrics |
browsing api_* models for dashboard building | returns chart type suggestions and column info |
scaffold_dashboard_tab |
generating JS + YAML for a new dashboard tab | idempotent, supports dry_run mode |
Dashboard tools are registered when DASHBOARD_BUILDER_ENABLED=True and METRICS_DASHBOARD_PATH is set.
| Tool | Use when | Notes |
|---|---|---|
verify_numbers |
before reporting any computed numbers to the user | checks arithmetic and optionally cross-references via check_query |
The agent provides its computation logic (formula + component values) and the tool independently verifies the math. If a check_query is provided, it also cross-references against an independent data source.
Opt-in, gated pipeline for decision-oriented narratives. See the Storyteller Workflow section above for the full agent pipeline. Do not auto-upgrade a standard report request into a storyteller run.
| Tool | Use when | Notes |
|---|---|---|
storyteller_start_session |
user explicitly asks for a story, memo, executive brief, pitch, or recommendation | clears prior storyteller state; standard SessionState is untouched |
storyteller_status |
checking current phase and gate status mid-workflow | read-only snapshot |
storyteller_end_session |
abandoning a session | clears artifacts |
storyteller_record_context_brief |
after collecting audience, action, mechanism, tone | Gate 1. Rejects vague audiences and un-articulable actions at the Pydantic layer |
storyteller_record_big_idea |
after writing one declarative sentence with stakes | Gate 2. Rejects labels and single-word "ideas" |
storyteller_record_storyboard |
after mapping scenes to setup → tension → resolution | Gate 3. Rejects storyboards with no tension or no resolution |
storyteller_record_visual_spec |
one call per storyboard scene | Gate 4. Rejects pie/donut/3D/dual-axis; rejects descriptive titles; rejects scene indexes not in the storyboard |
storyteller_record_final_story |
after assembling markdown with {{chart:CHART_ID}} placeholders |
Gate 5. Requires every scene to have a visual spec |
storyteller_run_clarity_checks |
after running title-only readthrough, reverse storyboard, fresh-eye, and audits | Gate 6. ready_for_handoff=True only if all checks pass; failures loop back to earliest failing phase |
storyteller_record_accessibility_pass |
after colorblind, contrast, language, and tone checks | Gate 7. Hard failures block handoff |
storyteller_generate_story_report |
after every gate has passed | renders through the existing create_report_artifact; standard session state is preserved |
The seven persona briefs are loaded via get_agent_persona(role) with roles storyteller_orchestrator, storyteller_context, storyteller_narrative, storyteller_visual_designer, storyteller_writer, storyteller_critic, storyteller_accessibility. Each persona has a narrow contract and is adopted only for its phase.
Prompts are guidance text returned by the MCP server. They do not execute analysis by themselves.
Prompts available today:
getting_startedanalyze_data(topic)explore_protocol(protocol)write_query(question, database="dbt")report(period, topics, focus)adopt_persona_cerebro_dispatcher— start here for non-trivial requestsadopt_persona_analytics_reporteradopt_persona_gnosis_research_analystadopt_persona_ui_designeradopt_persona_reality_checkerconduct_research_peer_review(packet_json)orchestrator(user_request),data_engineer(task),data_scientist(task),frontend_agent(task)— per-task decomposition scaffolds
See the Agent Fleet section above for the full 23-persona catalog loadable via get_agent_persona(role). Personas are guidance, not automation; each is adopted for a specific phase of work and produces specific artifacts.
The seven storyteller personas are split by design — each is adopted for exactly one phase of an artifact-handoff pipeline. The three MMM personas split orchestration, DAG review, and simulation along the same lines. See the Storyteller Workflow and MMM Workflow sections for the gate semantics.
Resources expose stable reference material:
- platform overview
- ClickHouse SQL guide
- chain parameters
- address directory
- metric definitions
- query cookbook
gnosis://semantic-model/{name}gnosis://semantic-metric/{name}gnosis://semantic-relationship/{name}gnosis://semantic-module/{module}gnosis://semantic-graph-overview
If a client wants to use a persona, it should explicitly load the prompt or call get_agent_persona.
This part is easy to misunderstand, so the distinctions matter.
- charts are stored in memory
- chart IDs are session-scoped
- chart registry entries are pruned after a TTL
- if the server restarts, chart IDs are lost
Use charts when you are actively building a report in the current session.
- reports are saved as standalone HTML on disk
- reports can be reopened later by report ID
- in SSE mode, reports can be exposed through
/reports/{id} export_reportcan return a local path or a download URL
Use reports for durable visual deliverables.
- saved queries store SQL and metadata
- they do not store the executed result set
- they are useful for reuse, not as evidence snapshots
- persisted sync query results are saved inside the research project
- these are the right artifact for durable SQL evidence
- they are created by
execute_query(..., persist_result=True)
- Python 3.10+
- Node.js 20+ for building the bundled report UI
- ClickHouse credentials for the Gnosis analytics warehouse
git clone https://github.com/gnosischain/cerebro-mcp.git
cd cerebro-mcp
cp .env.example .envFor local development, the code default is already a writable local path: .cerebro/research_projects.
Recommended local overrides in .env:
CLICKHOUSE_PASSWORD=your_password_here
CEREBRO_RESEARCH_DIR=.cerebro/research_projects
THINKING_LOG_DIR=.cerebro/logs
ASYNC_RESULT_DIR=.cerebro/query_results
SEMANTIC_ENABLED=FalseThen build and install:
make installWhat make install does:
- builds the report UI bundle
- copies it into
src/cerebro_mcp/static/report.html - installs the Python package in editable mode
Semantic is opt-in at deployment time. The single control surface is:
SEMANTIC_ENABLED=TrueWhen semantic is enabled, the server also needs access to:
manifest.jsoncatalog.jsonsemantic_registry.jsonsemantic_docs_index.json
The default hosted URLs already point at the published dbt-cerebro artifacts. For local or private deployments, switch to filesystem paths or alternate URLs:
DBT_MANIFEST_PATH=/absolute/path/to/manifest.json
DBT_CATALOG_PATH=/absolute/path/to/catalog.json
SEMANTIC_REGISTRY_PATH=/absolute/path/to/semantic_registry.json
SEMANTIC_DOCS_INDEX_PATH=/absolute/path/to/semantic_docs_index.jsonSemantic runtime states are intentionally narrow:
SEMANTIC_ENABLED=False- semantic tools and semantic resources are not registered
SEMANTIC_ENABLED=Truewith healthy artifacts- semantic tools and resources are registered and executable
SEMANTIC_ENABLED=Truewith stale or unavailable artifacts- semantic tools and resources stay registered
- semantic discovery and docs still work when possible
- semantic execution returns a graceful unavailable or fallback response instead of silently switching to raw SQL
This is the default mode for local desktop MCP hosts:
cerebro-mcpUse this for remote or browser-accessed MCP clients:
export MCP_AUTH_TOKEN=replace_me
cerebro-mcp --sseSSE behavior:
- binds to
FASTMCP_HOST/FASTMCP_PORT - requires
Authorization: Bearer <token>by default /healthis public/reports/{id}accepts bearer auth or?token=...
To disable auth for local testing only:
ALLOW_INSECURE_REMOTE_TRANSPORT=TrueBuild:
docker build -t cerebro-mcp .Run:
docker run \
--env-file .env \
-p 8000:8000 \
-v "$(pwd)/data:/data" \
cerebro-mcpImportant for Docker:
- the image uses
/datafor persistent storage - mounted
/datamust be writable by container useruid 1000 - reports, saved queries, logs, and research projects all live under
/datain the image defaults - if you use local semantic artifact paths instead of published URLs, mount those files into the container and point the corresponding
*_PATHsettings at the mounted location
If cerebro-mcp is on your PATH:
{
"mcpServers": {
"cerebro": {
"command": "cerebro-mcp",
"env": {
"CLICKHOUSE_PASSWORD": "your_password",
"CEREBRO_RESEARCH_DIR": ".cerebro/research_projects"
}
}
}
}Using uv from a checked-out repo:
{
"mcpServers": {
"cerebro": {
"command": "/path/to/uv",
"args": ["--directory", "/path/to/cerebro-mcp", "run", "cerebro-mcp"]
}
}
}{
"servers": {
"cerebro": {
"command": "/path/to/uv",
"args": ["--directory", "/path/to/cerebro-mcp", "run", "cerebro-mcp"]
}
}
}{
"mcpServers": {
"cerebro": {
"url": "https://mcp.analytics.gnosis.io/sse",
"headers": {
"Authorization": "Bearer <token>"
}
}
}
}/health performs a real ClickHouse connectivity check and returns:
200when ClickHouse is reachable503when ClickHouse is not reachableclickhouse_versionssl_trust_injected
In SSE mode:
GET /reports/{report_id}
Behavior:
- accepts full UUID or unique short prefix
- returns standalone HTML
- supports bearer header or
?token=...
At startup the server tries to load:
- the dbt manifest
- the dbt catalog
- the semantic registry when semantic is enabled
- the semantic docs index when semantic is enabled
- the external docs index
If these fail, dbt and docs-assisted capabilities degrade, but the server can still start.
Semantic-specific behavior:
- semantic loading only happens in
main(), not at import time - when the semantic registry hash is stale against the manifest or catalog, semantic execution is disabled but semantic discovery/docs can remain available
- semantic snapshot reloads happen off-thread and the runtime swaps to the new snapshot under a lock
Important for stdio clients:
- the MCP transport expects JSON-RPC on
stdoutonly - any plain-text startup logging sent to
stdoutwill break the connection - Cerebro sends startup diagnostics through logging/stderr instead
get_clickhouse_query_rules is intentionally strict. It is registered only when a valid vendored bundle exists under CLICKHOUSE_AGENT_SKILLS_PATH.
The canonical sync flow is:
python scripts/sync_clickhouse_skills.py /path/to/local/agent-skills-checkout --ref <pinned_commit>The sync script copies only the required upstream content:
skills/clickhouse-best-practices/**LICENSENOTICE- a local
bundle_manifest.jsoncontaining the pinned source ref and deterministic compiled rules path
The implementation environment can stay offline as long as the operator provides a local checkout of the upstream repo. The server will not register get_clickhouse_query_rules from a partially copied directory.
All settings are environment variables or .env values.
| Variable | Default | Description |
|---|---|---|
DBT_MANIFEST_URL |
published dbt-cerebro manifest URL |
remote manifest source; takes precedence over local path |
DBT_MANIFEST_PATH |
empty | local manifest fallback |
DBT_CATALOG_URL |
published dbt-cerebro catalog URL |
remote catalog source; takes precedence over local path |
DBT_CATALOG_PATH |
empty | local catalog fallback |
DOCS_BASE_URL |
https://docs.analytics.gnosis.io/ |
canonical docs host used to derive llms.txt, llms-ctx*.txt, and page mirror URLs |
DOCS_SEARCH_INDEX_URL |
published docs search URL | external platform docs index |
DOCS_SEARCH_INDEX_PATH |
empty | local docs index fallback |
DOCS_REFRESH_INTERVAL_SECONDS |
3600 |
docs index refresh cadence |
GNOSIS_CHAIN_DOCS_LLM_URL |
https://docs.gnosischain.com/llms.txt |
Gnosis Chain docs context artifact used for search and broad chain context |
SEMANTIC_ENABLED |
False |
deployment-level semantic switch; the only semantic on/off control |
SEMANTIC_REGISTRY_URL |
published dbt-cerebro registry URL |
remote semantic registry source |
SEMANTIC_REGISTRY_PATH |
empty | local semantic registry fallback |
SEMANTIC_DOCS_INDEX_URL |
published dbt-cerebro semantic docs URL |
remote semantic docs index source |
SEMANTIC_DOCS_INDEX_PATH |
empty | local semantic docs index fallback |
SEMANTIC_REFRESH_INTERVAL_SECONDS |
300 |
semantic snapshot refresh cadence |
CLICKHOUSE_AGENT_SKILLS_PATH |
src/cerebro_mcp/static/clickhouse_agent_skills |
vendored ClickHouse rules bundle root |
| Variable | Default | Description |
|---|---|---|
DASHBOARD_BUILDER_ENABLED |
False |
enable dashboard scaffolding tools |
METRICS_DASHBOARD_PATH |
empty | absolute path to metrics-dashboard repo root |
CUSTOM_TOOLS_ENABLED |
False |
enable YAML-defined parameterized query tools |
CUSTOM_TOOLS_PATH |
empty | path to custom_tools.yaml config file |
| Variable | Default | Description |
|---|---|---|
CLICKHOUSE_HOST |
ujt1j3jrk0.eu-central-1.aws.clickhouse.cloud |
ClickHouse server |
CLICKHOUSE_PORT |
8443 |
ClickHouse port |
CLICKHOUSE_USER |
default |
ClickHouse user |
CLICKHOUSE_PASSWORD |
empty | ClickHouse password |
CLICKHOUSE_SECURE |
True |
Use TLS |
CLICKHOUSE_VERIFY |
True |
Verify TLS certificates |
CLICKHOUSE_CONNECT_TIMEOUT |
30 |
connect timeout |
CLICKHOUSE_SEND_RECEIVE_TIMEOUT |
300 |
socket timeout |
CLICKHOUSE_QUERY_TIMEOUT_SECONDS |
30 |
server-side execution timeout |
QUERY_TIMEOUT_SECONDS |
30 |
deprecated fallback |
MAX_ROWS |
10000 |
max query result rows for internal execution |
MAX_QUERY_LENGTH |
10000 |
max accepted SQL length |
| Variable | Default | Description |
|---|---|---|
TOOL_RESULT_MAX_ROWS |
200 |
max rows returned to sync tool consumers |
TOOL_RESULT_MAX_CHARS |
40000 |
max serialized sync tool payload size |
TOOL_SUMMARY_BUDGET_RATIO |
0.9 |
fraction reserved for summary_markdown |
TOOL_RESPONSE_MAX_CHARS |
40000 |
legacy fallback if TOOL_RESULT_MAX_CHARS is unset |
| Variable | Default | Description |
|---|---|---|
ASYNC_RESULT_PAGE_SIZE |
200 |
async result preview page size |
ASYNC_RESULT_MEMORY_THRESHOLD_BYTES |
5000000 |
spill large async results to disk |
ASYNC_RESULT_DIR |
.cerebro/query_results |
async result storage |
| Variable | Default | Description |
|---|---|---|
CEREBRO_RESEARCH_DIR |
.cerebro/research_projects |
durable research project root for local/dev; Docker overrides to /data/research_projects |
RESEARCH_PAGE_SIZE_DEFAULT |
20 |
default page size for research listings |
RESEARCH_PAGE_SIZE_MAX |
100 |
max page size for research listings |
| Variable | Default | Description |
|---|---|---|
CEREBRO_REPORT_DIR |
code fallback ~/.cerebro/reports, Docker override /data/reports |
report storage |
CEREBRO_SAVED_QUERIES_DIR |
tool fallback ~/.cerebro-mcp, Docker override /data/saved-queries |
saved SQL snippets |
THINKING_LOG_DIR |
.cerebro/logs |
reasoning trace directory |
THINKING_ALWAYS_ON |
True |
auto capture tool calls |
THINKING_LOG_RETENTION_DAYS |
30 |
trace retention |
REPORT_BASE_URL |
empty | URL prefix for exported reports |
MCP_AUTH_TOKEN |
empty | required for SSE unless insecure mode is enabled |
ALLOW_INSECURE_REMOTE_TRANSPORT |
False |
disable SSE auth for local testing |
FASTMCP_HOST |
0.0.0.0 |
SSE bind host |
FASTMCP_PORT |
8000 |
SSE bind port |
| Variable | Default | Description |
|---|---|---|
MCP_SECURITY_POLICY_MODE |
log_only |
security policy mode; future: warn, enforce |
MCP_SECURITY_LOG_DIR |
.cerebro/security_audit |
directory for daily JSONL security audit logs |
MCP_EXPECTED_MANIFEST_SHA256 |
empty | optional manifest pin; empty disables hash verification |
| Variable | Default | Description |
|---|---|---|
ENFORCE_CHART_PRECONDITIONS |
True |
enable chart/report gates |
MIN_MODELS_DETAILED |
3 |
model detail explorations required before gated charting |
MIN_TABLES_VERIFIED |
1 |
tables to verify before gated charting |
MIN_CHARTS_FOR_REPORT |
3 |
minimum registered charts before report creation |
MIN_EXPLORATORY_QUERIES |
2 |
minimum prior query executions before report creation |
REQUIRE_DIMENSIONAL_BREAKDOWN |
True |
require a dimensional breakdown in report flow |
REQUIRE_RELATIONAL_CHART |
True |
require relational analysis in report flow |
Use system_status() to inspect the live resolved configuration, cache sizes, transport mode, and ClickHouse connectivity.
Cerebro MCP exposes comprehensive Prometheus metrics and structured JSON logs. A ready-to-import Grafana dashboard is provided at grafana/cerebro-mcp-observability.json.
The dashboard covers 9 sections:
- Overview — replicas, pods, restarts, images
- HTTP / SSE — request rate, p95 latency, 4xx/5xx errors
- MCP Internals — MCP request/tool call rates, latency, top failing tools
- Security Audit — suspicious calls, high-risk tool calls, app-only calls, report auth denials
- Tool Usage Details — top tools by volume, call distribution, error rates, p99 latency
- Semantic Layer — semantic state, registry stats, query attempts, route decisions, planner failures
- ClickHouse — query rate, latency, errors, rows returned
- Pod Resources — CPU, memory, network, throttling, restarts
- Logs — structured logs, tool calls, security audit events, artifact reloads
Prometheus metrics include:
cerebro_http_*,cerebro_mcp_*— HTTP and MCP protocol metricscerebro_clickhouse_*— ClickHouse query metricscerebro_security_*— security audit counters (high-risk calls, suspicious flags, app-only calls)cerebro_report_token_auth_total— report endpoint auth eventssemantic_*— semantic layer metrics (query attempts, planner failures, repairs, fallbacks, snapshot age)
Low-cardinality metrics track tool calls, planner failures, retries, fallbacks, and security events by tool name, risk class, and transport. High-cardinality details (resolved metrics, SQL hash, ClickHouse error text) remain in structured logs and reasoning traces.
See docs/observability.md for the full metric catalog, Grafana setup, and structured log event reference.
The server enforces the following at the execution layer:
- read-only SQL only
- blocked DDL and DML
- blocked
FORMAT,SETTINGS, and external reader functions such ass3(...),url(...),remote(...) - database allowlist
- identifier validation
- forced result capping even when a query already contains
LIMIT - JSON-safe normalization for ClickHouse values
- best-effort OS trust-store injection for TLS
- custom tools use ClickHouse native parameter binding — immune to SQL injection
- custom tool SQL templates are peer-reviewed and static; the LLM only provides parameter values
- dashboard scaffold tool validates blueprints via Pydantic before any file writes
verify_numberstool enforces arithmetic verification before numerical claims reach the user- tool risk classification: every tool is assigned a risk class (
read_only,server_state_write,workspace_write,subprocess,app_only) - suspicious-call detection: flags app-only tool invocations, workspace writes over SSE, and unknown tools
- append-only security audit log: JSONL file per day with redacted arguments, SHA-256 hashes, risk class, and suspicious flags
- security layer is observation-only (
log_onlymode) — never blocks tool execution
See docs/security.md for the full security architecture, risk registry, and audit event schema.
Recommended operational hardening:
- use a least-privilege ClickHouse user
- keep
CLICKHOUSE_VERIFY=True - require
MCP_AUTH_TOKENon remote SSE - monitor
cerebro_security_suspicious_calls_totalin Grafana for anomalous tool invocations
Saved-query reminder:
save_queryis for reusable SQL- it is not evidence storage
- use research snapshots when you need durable query evidence
get_agent_persona("cerebro_dispatcher")
The dispatcher classifies intent, runs preflight_analytics_request, picks the specialist chain, and emits a mandatory dispatch manifest. The patterns below are subordinate to that manifest. Skip the dispatcher only for trivial turns ("hi", "list reports", "open report 3"), explicit specialist invocations ("use forecasting_analyst on validator count"), or follow-up turns inside an already-dispatched workflow.
Use:
discover_modelsget_model_detailsif neededdescribe_table
Use:
discover_metricsandquery_metricsfirst if the request maps cleanly to a governed metric- otherwise
discover_models describe_tableexecute_query
If semantic execution is unavailable or unsupported, follow the returned fallback reason and use raw SQL.
If the result is large or slow:
start_queryget_query_results
Use:
quick_chartfor a one-off plotgenerate_chartif you are already in report mode and only need one chart
Use:
discover_modelsdescribe_tableexecute_queryfor EDA and support queriesgenerate_chartsgenerate_report
Use the research workflow:
start_research_project- phase planning and execution
- persisted evidence capture
- verification
- peer review
- publication
Use the MMM workflow. This is a gated flow — generate_report is blocked until mmm_causal_reviewer returns VERDICT: PASS.
get_agent_persona("mmm_analyst")- Follow the SOP: spine-fill → multicollinearity → baseline → adstock → fit → decompose
- Synthesize the DAG as a markdown table
get_agent_persona("mmm_causal_reviewer")— pass the DAG; iterate until PASSgenerate_charts(5 required charts) →generate_report- Optional:
get_agent_persona("mmm_simulator")for ±30%-bounded budget reallocation
Use the storyteller workflow when the user explicitly asks for a story, narrative, memo, brief, pitch, or recommendation artifact. Do not auto-upgrade a standard report request. If the scope is ambiguous ("give me a report on X"), ask which mode the user wants.
storyteller_start_session- Adopt
get_agent_persona("storyteller_orchestrator")and("storyteller_context") - Collect audience, required action, mechanism, tone from the user
storyteller_record_context_brief(...)— rejects vague audiences- Run normal Cerebro discovery and EDA to gather evidence
storyteller_record_big_idea(sentence, stakes)— one declarative sentencestoryteller_record_storyboard(scenes, narrative_order)— setup → tension → resolution- For each scene:
storyteller_record_visual_spec(...)thengenerate_chartsthen attachchart_id storyteller_record_final_story(title, content_markdown)storyteller_run_clarity_checks(checks=[...])— loops back on failurestoryteller_record_accessibility_pass(passed, notes)storyteller_generate_story_report()
cerebro-mcp/
├── src/cerebro_mcp/
│ ├── server.py
│ ├── bootstrap.py
│ ├── config.py
│ ├── clickhouse_client.py
│ ├── artifact_loader.py
│ ├── catalog_loader.py
│ ├── tool_models.py
│ ├── tool_output.py
│ ├── safety.py
│ ├── manifest_loader.py
│ ├── docs_loader.py
│ ├── semantic_loader.py
│ ├── semantic_models.py
│ ├── semantic_index.py
│ ├── semantic_graph.py
│ ├── semantic_planner.py
│ ├── semantic_sql_compiler.py
│ ├── dashboard_models.py
│ ├── custom_tool_models.py
│ ├── research_models.py
│ ├── research_store.py
│ ├── research_workflow.py
│ ├── storyteller_models.py
│ ├── storyteller_state.py
│ ├── tools/
│ │ ├── query.py
│ │ ├── query_async.py
│ │ ├── schema.py
│ │ ├── dbt.py
│ │ ├── metadata.py
│ │ ├── saved_queries.py
│ │ ├── visualization.py
│ │ ├── research.py
│ │ ├── session_state.py
│ │ ├── reasoning.py
│ │ ├── agents.py
│ │ ├── dashboard_builder.py
│ │ ├── custom_queries.py
│ │ ├── cross_check.py
│ │ ├── storyteller.py
│ │ ├── semantic.py
│ │ ├── mini_apps.py
│ │ ├── metric_lab.py
│ │ └── token_explorer.py
│ ├── security.py
│ ├── mini_app_cache.py
│ ├── mini_app_models.py
│ ├── prompts/
│ ├── resources/
│ └── static/
├── custom_tools.yaml
├── docs/
│ ├── MINI_APPS.md
│ ├── security.md
│ └── observability.md
├── grafana/
│ └── cerebro-mcp-observability.json
├── scripts/
├── ui/
├── tests/
├── Dockerfile
├── Makefile
└── .env.example
make build-ui
make install
make dev
pytest -vUseful local checks:
python -m compileall src tests
cerebro-mcp
cerebro-mcp --sse
python scripts/sync_clickhouse_skills.py /path/to/local/agent-skills-checkout --ref <pinned_commit>MCP Inspector example:
uv run mcp dev src/cerebro_mcp/server.pyYou can test semantic MCP features locally before publishing any artifacts remotely.
- Build fresh artifacts in
dbt-cerebro:
cd /path/to/dbt-cerebro
dbt docs generate
python scripts/semantic/build_registry.py --validate --target-dir target
python scripts/semantic/build_semantic_docs.py --target-dir target- Point
cerebro-mcpat the local artifact paths and clear the remote URLs:
SEMANTIC_ENABLED=true
DBT_MANIFEST_URL=
DBT_CATALOG_URL=
SEMANTIC_REGISTRY_URL=
SEMANTIC_DOCS_INDEX_URL=
DBT_MANIFEST_PATH=/absolute/path/to/dbt-cerebro/target/manifest.json
DBT_CATALOG_PATH=/absolute/path/to/dbt-cerebro/target/catalog.json
SEMANTIC_REGISTRY_PATH=/absolute/path/to/dbt-cerebro/target/semantic_registry.json
SEMANTIC_DOCS_INDEX_PATH=/absolute/path/to/dbt-cerebro/target/semantic_docs_index.json- Start the server and validate semantic behavior:
cerebro-mcpRecommended checks:
discover_metricsget_metric_detailsexplain_metric_queryquery_metrics
Runtime behavior:
SEMANTIC_ENABLED=false: semantic tools and resources are not registeredSEMANTIC_ENABLED=truewith healthy local artifacts: semantic tools are registered and executableSEMANTIC_ENABLED=truewith stale or mismatched artifacts: semantic tools remain available, but execution returns graceful unavailable or coverage-gap guidance
cerebro-mcp now treats semantic execution as approved-only:
- approved metrics can be discovered and executed
- candidate metrics remain visible in the registry/docs layer but are not executable
- when semantic coverage is missing, MCP returns a structured semantic coverage gap and falls back to the raw SQL path instead of silently mixing candidate assets into execution
Semantic resources now prefer the generated semantic page bodies referenced by semantic_docs_index.json, with JSON fallback only when a page body is unavailable.
| Package | Purpose |
|---|---|
mcp[cli] |
FastMCP server and MCP protocol support |
clickhouse-connect |
ClickHouse client |
pyarrow |
Arrow fetch path for ClickHouse |
pydantic-settings |
environment-backed settings |
python-dotenv |
.env loading |
requests |
manifest/docs HTTP fetching |
truststore |
OS trust-store TLS integration |
PyYAML |
YAML parsing for dashboard configs and custom tool definitions |
Frontend stack:
- React
- ECharts
- Tailwind
@modelcontextprotocol/ext-apps
See LICENSE.
