A local-first personal knowledge base with task management and memory retrieval, exposing token-budgeted tools to AI agents over MCP.
Long-running AI agents face a hard constraint: context windows are finite, and filling them with irrelevant content wastes both money and reasoning quality. Research on agentic memory systems — MemGPT, Generative Agents, Mem0 — has converged on the same insight: agents need explicit mechanisms to control what they read and how much they read, not just better retrieval.
Existing tools don't solve this well:
- Knowledge graph tools (Obsidian plugins, etc.) are built for human navigation with no concept of token budgets or machine-readable retrieval APIs.
- RAG-as-a-service (Pinecone, Weaviate cloud) requires network calls, incurs per-query costs, and sends your notes to third parties — with only simple top-k vector search.
- Local vector databases (sqlite-vec, Chroma) provide semantic similarity but miss keyword search, recency, link structure, and tag matching. A query for "meeting notes from last Tuesday" fails entirely on pure vector search.
brain takes a different position: run everything locally, combine all retrieval signals into a hybrid score, enforce token budgets at the API level, and treat Markdown files as the durable source of truth.
brain is a Rust daemon that manages tasks, records, and Markdown notes. It incrementally indexes content into a dual-store system (SQLite for full-text search and metadata, LanceDB for 384-dim vector embeddings) and serves retrieval tools over MCP stdio JSON-RPC.
Token budgeting is a first-class constraint. memory.retrieve supports three levels of detail (L0 extractive abstracts ~100 tokens, L1 LLM summaries ~2000 tokens, L2 full source). Agents choose their budget upfront and get LOD-adjusted content in a single call, spending context window space efficiently.
Hybrid scoring combines six signals — vector similarity, BM25 keyword ranking, recency decay, backlink count, tag match, and importance — into a single relevance score. A strategy parameter shifts signal weights per query type: lookup upweights keyword precision; planning upweights recency and link structure; synthesis upweights semantic similarity.
Everything runs on-device. No network calls, no API keys, no ongoing cost after the initial ~130MB model download. The entire index can be rebuilt from the Markdown files at any time.
An event-sourced task system with dependencies, labels, and comments. Tasks are automatically indexed into the memory system as they are created or updated, making them semantically searchable alongside notes.
Hybrid retrieval combining vector search (LanceDB) and keyword search (SQLite FTS5). Supports episodic memory (episodes), procedural memory (procedures), and reflective synthesis (reflections).
Content-addressed storage for typed work products and snapshots. Records can be linked to tasks or note chunks, providing a durable audit trail of documents, analyses, plans, and state captures.
Markdown-based knowledge base. brain watches your note directories and incrementally indexes changes, extracting wiki-links, tags, and heading-aware chunks.
An internal job system for deferred and recurring work. Handles background tasks like embedding stale content, generating summaries, and consolidating episodic memories into reflections.
Real-time multi-agent coordination. Enables multiple agents to share a coordination room, exchange findings, and resolve blockers during complex multi-step tasks.
brew install benediktms/brain/braincurl --proto '=https' --tlsv1.2 -LsSf https://github.com/benediktms/brain/releases/latest/download/brain-installer.sh | shRequires Rust (stable, edition 2024), just, and protobuf (brew install protobuf or apt install protobuf-compiler).
git clone https://github.com/benediktms/brain.git
cd brain
just installbrain init ~/notes
brain daemon startbrain init creates the brain configuration, downloads the embedding model (~130 MB on first run), and performs the initial index. brain daemon start launches the daemon and registers it to auto-start on login (launchd on macOS, systemd on Linux).
brain uses BGE-small-en-v1.5 for embeddings. The model is downloaded once and verified via BLAKE3 checksums at every startup.
~/.brain/models/bge-small-en-v1.5/
config.json BERT config (hidden_size=384)
tokenizer.json WordPiece tokenizer
model.safetensors Model weights (~130MB, memory-mapped)
Download the model (choose one):
# Option 1: Run the setup script directly (requires curl + installs HuggingFace CLI if needed)
curl -sSL https://raw.githubusercontent.com/benediktms/brain/master/scripts/setup-model.sh | bash
# Option 2: If you have the HuggingFace CLI already installed
hf download BAAI/bge-small-en-v1.5 config.json tokenizer.json model.safetensors \
--local-dir ~/.brain/models/bge-small-en-v1.5Override the model location with BRAIN_MODEL_DIR or BRAIN_HOME. If a file is corrupted or swapped, brain will report a checksum mismatch with expected and actual hashes — re-download the model to fix.
{
"mcpServers": {
"brain": {
"command": "brain",
"args": ["daemon"]
}
}
}brain distributes plugins for agent runtimes. The plugin surface is canonical — install via the CLI and restart the target client.
brain plugin install # Claude Code, default target
brain plugin install --target claude # explicit Claude Code install
brain plugin install --target codex # Codex installClaude Code installs four domain plugins under ~/.claude/plugins/marketplaces/ and registers them with the claude CLI when available. This provides /tasks:*, /mem:*, /records:*, and /brain:* commands plus the Brain hooks (SessionStart, UserPromptSubmit, PreCompact, Stop, PreToolUse) without mutating your project's .claude/settings.json.
Codex installs one consolidated home-local plugin at ~/.agents/plugins/brain/ and upserts the brain entry in ~/.agents/plugins/marketplace.json. This exposes the bundled Brain skills for tasks, memory, records, and administration through the Codex plugin marketplace. Configure the Brain MCP server in Codex first so the skills can call the Brain MCP tools.
To remove a plugin:
brain plugin uninstall # Claude Code, default target
brain plugin uninstall --target codex # CodexRestart Claude Code or Codex after installing or uninstalling so the client reloads plugin metadata.
brain hooks install directly injects hook entries into .claude/settings.json. This path is retained for environments where the Claude Code plugin marketplace is unavailable (CI runners, air-gapped machines):
brain hooks install # mutates .claude/settings.json
brain hooks install --dry-run # preview changes without applyingPrefer brain plugin install --target claude for all standard Claude Code setups.
| Tool | Description |
|---|---|
memory.retrieve |
Search notes and retrieve content at requested level of detail (L0/L1/L2). Unifies search and expansion in one call. Supports hybrid ranking, metadata filters, and cross-brain search. |
memory.write_episode |
Record an episodic memory (goal, actions, outcome) with tags and importance. |
memory.reflect |
Two-phase: returns source material for synthesis, then stores the agent-generated reflection. |
tasks.apply_event |
Apply a task event (create, update, status change, dependency, label, comment) via event sourcing. |
tasks.get |
Get a single task by ID with full details, relationships, comments, labels, and linked notes. Supports brain param for cross-brain fetch. |
tasks.list |
List tasks filtered by status (all, ready, blocked) or fetch specific tasks by ID. Supports brain param for cross-brain listing. |
tasks.close |
Close one or more tasks by ID. Supports brain param for cross-brain close. |
tasks.next |
Return highest-priority ready tasks, sorted by priority or due date. |
records.create_document |
Create a document record with text (plain) or data (base64) content. Embedded, searchable, and included in scope summaries. |
records.create_analysis |
Create an analysis record with text (plain) or data (base64) content. Embedded, searchable, and included in scope summaries. |
records.create_plan |
Create a plan record with text (plain) or data (base64) content. Embedded, searchable, and included in scope summaries. |
records.save_snapshot |
Save a snapshot record with text (plain) or data (base64) content. |
records.get |
Get a record by ID with full metadata, tags, and links. Supports prefix resolution and brain param for cross-brain access. |
records.list |
List records with optional filters (kind, status, tag, task_id). Supports brain param for cross-brain access. |
records.fetch_content |
Fetch the raw content of a record as base64-encoded data. Supports brain param for cross-brain access. |
records.archive |
Archive a record (metadata-only operation, payload preserved). |
records.tag_add |
Add a tag to a record. Idempotent. |
records.tag_remove |
Remove a tag from a record. Idempotent. |
records.link_add |
Link a record to a task or note chunk. |
records.link_remove |
Remove a link from a record to a task or note chunk. |
status |
Get runtime health metrics: latency percentiles, queue depth, stuck files. |
records.create_document,records.create_analysis, andrecords.create_plancreate typed records with deterministic policy: they are embedded into LanceDB, searchable via records and memory retrieval, and included in scope summaries.records.save_snapshotcreates snapshot records for opaque state capture; snapshots are stored durably but are not embedded or summarized.- Other canonical kinds follow the same per-kind policy:
summaryis embedded + summarized + searchable,implementationandrevieware embedded + searchable only, andcustomdefaults to embedded + searchable.
Call memory.retrieve once with the LOD you need:
- L0 (~100 tokens per result) for scanning many results, orienting cheaply
- L1 (~2000 tokens per result) for balanced summary + detail, most common use
- L2 (full content) when you need the complete source
Total cost: 100–2000 tokens per result depending on LOD, vs. 4,000–8,000 tokens for naive top-k expansion.
graph TB
subgraph Notes["Markdown Notes (Source of Truth)"]
MD[("*.md files")]
end
subgraph Daemon["brain daemon"]
subgraph Ingest["Ingest Pipeline"]
FW[File Watcher] --> HG[Hash Gate<br/>BLAKE3]
HG --> MP[Markdown Parser] --> CH[Chunker] --> EM[Embedder<br/>BGE-small]
end
subgraph Query["Query Engine"]
HR[Hybrid Ranker] --- IP[Intent Profiles]
HR --- TB[Token Budget]
end
subgraph Server["MCP Server"]
SM["memory.*"] ~~~ TA["tasks.*"]
TA ~~~ RC["records.*"]
RC ~~~ NL["neural_link.*"]
end
subgraph Jobs["Job System"]
JW[Job Worker] --> JS[Job Store]
end
end
subgraph Storage["Dual Store (brain_persistence)"]
SQLite["SQLite<br/>FTS5 · metadata · links · tasks · records · jobs<br/>all tables scoped by brain_id"]
LanceDB["LanceDB<br/>384-dim embeddings (per-brain)"]
end
MD -->|fs events| FW
CH -->|metadata| SQLite
EM -->|vectors| LanceDB
Server --> Query
Query --> SQLite
Query --> LanceDB
JW --> SQLite
quadrantChart
title Memory Tiers
x-axis "High Token Cost" --> "Low Token Cost"
y-axis "Low Recall" --> "High Recall"
Tier 1 Episodic - Raw Chunks: [0.15, 0.85]
Tier 1 Episodic - Full Markdown: [0.2, 0.8]
Tier 1 Episodic - 384-dim Embeddings: [0.25, 0.75]
Tier 2 Semantic - Tags & Backlinks: [0.45, 0.55]
Tier 2 Semantic - Tasks & Timestamps: [0.55, 0.45]
Tier 3 Procedural - Summaries: [0.75, 0.25]
Tier 3 Procedural - Reflections: [0.8, 0.2]
Tier 3 Procedural - 2-sent Stubs: [0.85, 0.15]
For full technical details — sequence diagrams, hybrid scoring formula, intent weight profiles, performance targets, storage role separation, and mathematical foundations — see docs/ARCHITECTURE.md.
brain index # One-shot index all notes
brain watch # Watch and index incrementally
brain memory retrieve "weekly review template" # Search from CLI
brain memory retrieve --strategy planning "next steps"
brain memory retrieve --count 10 "async patterns"
brain daemon start # Start + register auto-start
brain daemon stop # Stop + deregister
brain daemon status # Check daemon stateBrains are named containers with independent notes, indexes, and config. Managed via a central registry at ~/.brain/:
# ~/.brain/state_projection.toml
[brains.personal]
root = "~/notes"
notes = ["~/notes"]
[brains.work]
root = "~/code/my-project"
notes = ["~/code/my-project/docs", "~/code/my-project/notes"]All derived data lives in ~/.brain/brains/<name>/, not in the project directory.
Tasks can be created, fetched, and closed across brains:
brain tasks create --title="..." --brain=work # Create a task in another brain
brain tasks show <id> --brain=work # Fetch task details from another brain
brain tasks list --brain=work # List tasks from another brain
brain tasks close <id> --brain=work # Close a task in another brain
brain tasks label batch-add area:infra BRN-01ABC --brain=work # Add label to remote tasksRecords can also be accessed across brains:
brain records list --brain=work # List records from another brain
brain records get <id> --brain=work # Get record details from another brain
brain records fetch-content <id> --brain=work # Fetch record content from another brainThe MCP tools tasks.create, tasks.get, tasks.list, tasks.close, tasks_labels_batch, records.list, records.get, and records.fetch_content all accept an optional brain parameter for cross-brain operations.
just build # Build
just check # Type check
just test # Run tests
just lint # Lint
just fmt # Format
just clean # Clean build artifacts
just clean-db # Clean database (forces full reindex)brain/
brain_lib/ # Core library: domain logic, ports, MCP server
brain_persistence/ # Persistence adapters: SQLite, LanceDB, object store
cli/ # Thin binary: wires CLI commands to library functions
docs/
ARCHITECTURE.md
RECORDS.md
OPERATIONS.md
RESEARCH.md
justfile
just tag patch # 0.1.0 -> 0.1.1
just tag minor # 0.1.0 -> 0.2.0
just tag major # 0.1.0 -> 1.0.0
just changelog- docs/ARCHITECTURE.md — Full technical architecture, sequence diagrams, storage design, mathematical foundations
- docs/OPERATIONS.md — Upgrading, backup, recovery, troubleshooting, performance tuning, model management
- docs/RECORDS.md — Records domain: artifacts, snapshots, content-addressed storage, CLI/MCP reference
- docs/RESEARCH.md — Research survey of agentic memory systems, retrieval design, token budget design
- MemGPT — Virtual context management for long-running agents (Packer et al., 2023)
- Generative Agents — Recency/relevance/importance scoring with reflective synthesis (Park et al., 2023)
- Mem0 — Memory extraction, consolidation, and tiered retrieval
- MCP Specification — AI agent tool integration over stdio JSON-RPC
MIT