diff --git a/README.md b/README.md index d14239b..6eeca9b 100644 --- a/README.md +++ b/README.md @@ -274,7 +274,7 @@ Use them independently or together. The [`@atomicmemory/llmwiki`](https://github ## Contributing -Contributions are welcome. If llmwiki is missing something you need, open an issue or PR and describe the workflow you are trying to support — need-driven improvements are often the best ones. If you want to contribute more generally, roadmap items are a good place to start. For larger changes to core compile, review, import/export, or retrieval semantics, please start with an issue or design discussion so we can align on the contract first. +Contributions are welcome. If llmwiki is missing something you need, open an issue or PR and describe the workflow you are trying to support - need-driven improvements are often the best ones. If you want to contribute more generally, roadmap items are a good place to start. For larger changes to core compile, review, import/export, or retrieval semantics, please start with an issue or design discussion so we can align on the contract first. Before committing code changes, run: diff --git a/docs/AGENTS.md b/docs/AGENTS.md index 50230af..914f27a 100644 --- a/docs/AGENTS.md +++ b/docs/AGENTS.md @@ -33,7 +33,7 @@ ## Style preferences - Use active voice and second person ("you") -- Keep sentences concise — one idea per sentence +- Keep sentences concise - one idea per sentence - Use sentence case for headings - Bold for UI elements: Click **Settings** - Code formatting for file names, commands, paths, and code references diff --git a/docs/README.md b/docs/README.md index 46a7b60..685055e 100644 --- a/docs/README.md +++ b/docs/README.md @@ -14,13 +14,13 @@ Internal refactors do not need docs unless they change an observable contract. ## Where changes belong -- `introduction.mdx` and `quickstart.mdx` — only for first-run or top-level positioning changes. -- `cli/*.mdx` — command syntax, flags, examples, output semantics, and command-specific safety notes. -- `configuration/*.mdx` — environment variables, providers, project config, schema, review policy, and defaults. -- `concepts/*.mdx` — durable concepts such as the wiki model, page types, citations, freshness, provenance, and review lifecycle. -- `guides/*.mdx` — end-to-end workflows and integrations. -- `troubleshooting/*.mdx` — common failure modes, recovery steps, and diagnostics. -- `docs.json` — navigation only. Add new pages here or they are effectively unpublished. +- `introduction.mdx` and `quickstart.mdx` - only for first-run or top-level positioning changes. +- `cli/*.mdx` - command syntax, flags, examples, output semantics, and command-specific safety notes. +- `configuration/*.mdx` - environment variables, providers, project config, schema, review policy, and defaults. +- `concepts/*.mdx` - durable concepts such as the wiki model, page types, citations, freshness, provenance, and review lifecycle. +- `guides/*.mdx` - end-to-end workflows and integrations. +- `troubleshooting/*.mdx` - common failure modes, recovery steps, and diagnostics. +- `docs.json` - navigation only. Add new pages here or they are effectively unpublished. ## Feature PR checklist diff --git a/docs/cli/compile.mdx b/docs/cli/compile.mdx index 2736c06..00c38b4 100644 --- a/docs/cli/compile.mdx +++ b/docs/cli/compile.mdx @@ -1,10 +1,10 @@ --- -title: "llmwiki compile — Generate Wiki Pages from Sources" +title: "llmwiki compile - Generate Wiki Pages from Sources" sidebarTitle: "Compile" description: "llmwiki compile runs the incremental two-phase LLM pipeline to extract concepts and generate wiki pages. Key flags: --review and --lang." --- -Compiling is the step that turns raw source files into a structured, interlinked wiki. When you run `llmwiki compile`, the pipeline runs in two phases: **Phase 1** reads every changed source in `sources/` and asks the LLM to extract all concepts, entities, and topics it finds. **Phase 2** takes those extracted concepts and generates typed wiki pages — one Markdown file per concept, with YAML frontmatter, paragraph-level citations back to source line ranges, and `[[wikilinks]]` connecting related concepts. +Compiling is the step that turns raw source files into a structured, interlinked wiki. When you run `llmwiki compile`, the pipeline runs in two phases: **Phase 1** reads every changed source in `sources/` and asks the LLM to extract all concepts, entities, and topics it finds. **Phase 2** takes those extracted concepts and generates typed wiki pages - one Markdown file per concept, with YAML frontmatter, paragraph-level citations back to source line ranges, and `[[wikilinks]]` connecting related concepts. Splitting into two phases matters: by completing all extraction before any pages are written, the compiler can merge concepts that appear across multiple sources into a single page, catch extraction failures before anything is committed to disk, and resolve cross-references that would be impossible to wire up in a single pass. @@ -25,7 +25,7 @@ Run from your project root. If `sources/` is empty or doesn't exist yet, compile ## Incremental behaviour -Compile is hash-based and incremental. Each source file's content is fingerprinted on ingest; on subsequent compiles, only sources whose content hash has changed are re-processed through the LLM. Sources that haven't changed are skipped entirely — no API calls, no rewrites. This means re-running `llmwiki compile` after editing a single source touches only the pages that source contributed to, even in a large wiki. +Compile is hash-based and incremental. Each source file's content is fingerprinted on ingest; on subsequent compiles, only sources whose content hash has changed are re-processed through the LLM. Sources that haven't changed are skipped entirely - no API calls, no rewrites. This means re-running `llmwiki compile` after editing a single source touches only the pages that source contributed to, even in a large wiki. The same incremental logic applies to embeddings: chunk-level embeddings are updated only for pages whose underlying content changed. @@ -68,7 +68,7 @@ If sources have changed since the last compile, affected pages are marked `stale llmwiki refresh --stale ``` -This recompiles only the sources that own stale pages and cleans up pages whose sources were all deleted (orphaned pages). Unrelated new sources that haven't been compiled yet are deliberately skipped — use `llmwiki compile` to bring those in separately. +This recompiles only the sources that own stale pages and cleans up pages whose sources were all deleted (orphaned pages). Unrelated new sources that haven't been compiled yet are deliberately skipped - use `llmwiki compile` to bring those in separately. ### `--dry-run` @@ -76,7 +76,7 @@ This recompiles only the sources that own stale pages and cleans up pages whose llmwiki refresh --stale --dry-run ``` -Prints the plan — which pages would be recompiled, which sources are involved, how many orphaned pages would be cleaned up — without making any LLM calls or writing any files. Use this to verify the plan before committing. +Prints the plan - which pages would be recompiled, which sources are involved, how many orphaned pages would be cleaned up - without making any LLM calls or writing any files. Use this to verify the plan before committing. ## `llmwiki watch` @@ -100,7 +100,7 @@ llmwiki review approve llmwiki review reject ``` -**Review policy (selective):** Add a `.llmwiki/config.json` to hold only pages that trip specific risk conditions — low confidence, contradictions, schema violations, or broken provenance — while writing the rest live. See [Review Policy](/configuration/review-policy) for the full configuration reference. +**Review policy (selective):** Add a `.llmwiki/config.json` to hold only pages that trip specific risk conditions - low confidence, contradictions, schema violations, or broken provenance - while writing the rest live. See [Review Policy](/configuration/review-policy) for the full configuration reference. `llmwiki compile` sends source content to your configured LLM provider. Make sure your provider credentials are set before running compile. For Anthropic, set `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN`. For other providers, see the [Configuration](/configuration) reference. diff --git a/docs/cli/export.mdx b/docs/cli/export.mdx index 3ab9219..ac9f69c 100644 --- a/docs/cli/export.mdx +++ b/docs/cli/export.mdx @@ -1,12 +1,12 @@ --- -title: "llmwiki export — Export Your Wiki to Portable Formats" +title: "llmwiki export - Export Your Wiki to Portable Formats" sidebarTitle: "Export" description: "llmwiki export converts your compiled wiki to llms.txt, JSON, JSON-LD, GraphML, Marp slides, or an Open Knowledge Format bundle." --- -Your compiled wiki lives on disk as markdown files, but the knowledge inside it is useful beyond the local viewer and CLI. `llmwiki export` transforms your wiki into portable formats — a compact llms.txt file that fits in an LLM context window, a structured JSON envelope for programmatic consumers, a JSON-LD graph for semantic tooling, a GraphML file for graph visualization, a Marp slide deck for presentations, or an Open Knowledge Format bundle for exchange with other knowledge tools. +Your compiled wiki lives on disk as markdown files, but the knowledge inside it is useful beyond the local viewer and CLI. `llmwiki export` transforms your wiki into portable formats - a compact llms.txt file that fits in an LLM context window, a structured JSON envelope for programmatic consumers, a JSON-LD graph for semantic tooling, a GraphML file for graph visualization, a Marp slide deck for presentations, or an Open Knowledge Format bundle for exchange with other knowledge tools. -Export is a pure transformation of existing wiki content. It makes no LLM calls and doesn't modify any files in `wiki/` or `sources/` — only writing artifacts under `dist/exports/` or the `--out` directory you choose for directory-style targets. +Export is a pure transformation of existing wiki content. It makes no LLM calls and doesn't modify any files in `wiki/` or `sources/` - only writing artifacts under `dist/exports/` or the `--out` directory you choose for directory-style targets. --- @@ -27,12 +27,12 @@ When you omit `--target`, the single-file formats are exported in one pass. The | Target | Flag value | Output file | What it produces | |--------|-----------|-------------|-----------------| -| llms.txt | `llms-txt` | `dist/exports/llms.txt` | Concise index per the [llmstxt.org](https://llmstxt.org) spec — page titles, summaries, and links. Compact enough to paste into an LLM context window. | -| llms-full.txt | `llms-full-txt` | `dist/exports/llms-full.txt` | Full content version of llms.txt — every page's complete body included. Larger but self-contained. | +| llms.txt | `llms-txt` | `dist/exports/llms.txt` | Concise index per the [llmstxt.org](https://llmstxt.org) spec - page titles, summaries, and links. Compact enough to paste into an LLM context window. | +| llms-full.txt | `llms-full-txt` | `dist/exports/llms-full.txt` | Full content version of llms.txt - every page's complete body included. Larger but self-contained. | | JSON | `json` | `dist/exports/wiki.json` | Structured JSON envelope with per-page metadata (kind, confidence, provenance, citations, aliases, freshness). See [JSON export structure](#json-export-structure) below. | | JSON-LD | `json-ld` | `dist/exports/wiki.jsonld` | Schema.org JSON-LD graph for semantic web and knowledge graph tooling. | | GraphML | `graphml` | `dist/exports/wiki.graphml` | Directed wikilink graph as XML. Import into [Gephi](https://gephi.org), Cytoscape, or any GraphML-compatible tool. | -| Marp | `marp` | `dist/exports/wiki.md` | [Marp](https://marp.app) presentation slide deck — one slide per concept page. | +| Marp | `marp` | `dist/exports/wiki.md` | [Marp](https://marp.app) presentation slide deck - one slide per concept page. | | Open Knowledge Format | `okf` | `dist/exports/okf/` | Directory bundle with `index.md`, per-page OKF docs, copied references, and `log.md`. Use this when another tool expects OKF-style markdown bundles. | ### Examples @@ -70,16 +70,16 @@ llmwiki export ## `--project-id ` -The `--project-id` flag embeds a stable identifier in the JSON export envelope. Downstream importers use this ID to derive deterministic external IDs for each page — so if you re-export and re-import, pages map to the same records rather than creating duplicates. +The `--project-id` flag embeds a stable identifier in the JSON export envelope. Downstream importers use this ID to derive deterministic external IDs for each page - so if you re-export and re-import, pages map to the same records rather than creating duplicates. ```bash llmwiki export --target json --project-id my-research-wiki ``` -Valid project IDs match the pattern `/^[a-z0-9][a-z0-9-]{0,62}$/` — lowercase letters, digits, and hyphens, starting with a letter or digit, up to 63 characters. The export is aborted before writing any files if the ID is invalid. +Valid project IDs match the pattern `/^[a-z0-9][a-z0-9-]{0,62}$/` - lowercase letters, digits, and hyphens, starting with a letter or digit, up to 63 characters. The export is aborted before writing any files if the ID is invalid. - If you're using the [Atomic Memory bridge](/guides/atomic-memory-bridge), always pass the same `--project-id` value on every export. The bridge derives stable memory record IDs from the project ID and page path — changing the ID will create duplicate records on re-import. + If you're using the [Atomic Memory bridge](/guides/atomic-memory-bridge), always pass the same `--project-id` value on every export. The bridge derives stable memory record IDs from the project ID and page path - changing the ID will create duplicate records on re-import. --- diff --git a/docs/cli/import.mdx b/docs/cli/import.mdx index 1d8151a..9c7cb20 100644 --- a/docs/cli/import.mdx +++ b/docs/cli/import.mdx @@ -1,5 +1,5 @@ --- -title: "llmwiki import — Import Open Knowledge Format Bundles" +title: "llmwiki import - Import Open Knowledge Format Bundles" sidebarTitle: "Import" description: "llmwiki import --okf reads an Open Knowledge Format bundle and stages its documents for review by default. Use --trusted only for bundles you already trust." --- diff --git a/docs/cli/ingest.mdx b/docs/cli/ingest.mdx index 9758bb2..2ec5a05 100644 --- a/docs/cli/ingest.mdx +++ b/docs/cli/ingest.mdx @@ -1,10 +1,10 @@ --- -title: "llmwiki ingest — Add Sources to Your Wiki Project" +title: "llmwiki ingest - Add Sources to Your Wiki Project" sidebarTitle: "Ingest" description: "llmwiki ingest fetches a URL or copies a local file into sources/. Also covers ingest-session for AI session exports and quickstart for one-step setup." --- -Before you can compile a wiki, you need raw material. Ingesting a source means pulling external content — a web page, a PDF, a YouTube video transcript, a local Markdown file — into your project's `sources/` directory, where it becomes available for the compile pipeline. Every ingest produces a single Markdown file with YAML frontmatter recording the source URL or path, the detected source type, the ingest timestamp, and a truncation flag if the content exceeded the character limit. That file is the stable, content-addressed record the compiler reads. +Before you can compile a wiki, you need raw material. Ingesting a source means pulling external content - a web page, a PDF, a YouTube video transcript, a local Markdown file - into your project's `sources/` directory, where it becomes available for the compile pipeline. Every ingest produces a single Markdown file with YAML frontmatter recording the source URL or path, the detected source type, the ingest timestamp, and a truncation flag if the content exceeded the character limit. That file is the stable, content-addressed record the compiler reads. You typically ingest first and compile second. If you're just getting started and want to do both in a single step, see [`llmwiki quickstart`](#llmwiki-quickstart) below. @@ -61,7 +61,7 @@ llmwiki ingest ./diagrams/system-architecture.png ### SSRF considerations -`llmwiki ingest` is a server-side fetch primitive — it follows the URL you give it and reads files from disk. **Only pass trusted input to `llmwiki ingest`.** If you're building a tool that ingests user-supplied URLs or untrusted strings, use the SDK's `ingestText()` method instead. `ingestText` accepts raw text without making any network requests or filesystem reads, so it is safe for untrusted content. +`llmwiki ingest` is a server-side fetch primitive - it follows the URL you give it and reads files from disk. **Only pass trusted input to `llmwiki ingest`.** If you're building a tool that ingests user-supplied URLs or untrusted strings, use the SDK's `ingestText()` method instead. `ingestText` accepts raw text without making any network requests or filesystem reads, so it is safe for untrusted content. --- @@ -88,7 +88,7 @@ llmwiki ingest-session ./exports/ llmwiki compile ``` -Session files land in `sources/` just like any other ingest result. The compile pipeline treats them identically to web or file sources — concepts are extracted from the conversation text and turned into wiki pages. +Session files land in `sources/` just like any other ingest result. The compile pipeline treats them identically to web or file sources - concepts are extracted from the conversation text and turned into wiki pages. --- @@ -98,7 +98,7 @@ Session files land in `sources/` just like any other ingest result. The compile llmwiki quickstart ``` -`quickstart` is a first-run convenience wrapper that ingests a source, compiles the wiki, and opens the local viewer — all in one command. It is the fastest way to go from zero to a browsable wiki. +`quickstart` is a first-run convenience wrapper that ingests a source, compiles the wiki, and opens the local viewer - all in one command. It is the fastest way to go from zero to a browsable wiki. Under the hood, `quickstart` runs `ingest` then `compile`. Compile requires LLM credentials; if credentials are missing, the source is still saved to disk before quickstart reports the compile failure, so your content is preserved and you can run `llmwiki compile` manually once credentials are configured. @@ -132,5 +132,5 @@ llmwiki quickstart ./brief.md --json ``` -After ingesting sources, run `llmwiki next` to get a recommended next action tailored to your project's current state — it reads your `sources/`, `wiki/`, and candidate queue to suggest whether you should compile, review candidates, run lint, or query. +After ingesting sources, run `llmwiki next` to get a recommended next action tailored to your project's current state - it reads your `sources/`, `wiki/`, and candidate queue to suggest whether you should compile, review candidates, run lint, or query. diff --git a/docs/cli/lint-eval.mdx b/docs/cli/lint-eval.mdx index b48318a..2b1af4c 100644 --- a/docs/cli/lint-eval.mdx +++ b/docs/cli/lint-eval.mdx @@ -1,12 +1,12 @@ --- -title: "llmwiki lint and eval — Wiki Quality Checks and Metrics" +title: "llmwiki lint and eval - Wiki Quality Checks and Metrics" sidebarTitle: "Lint & Eval" description: "llmwiki lint finds broken links, stale pages, and citation errors. llmwiki eval measures health scores and citation quality with CI-gateable thresholds." --- A compiled wiki is only as useful as it is accurate. Over time, sources change, pages accumulate broken citations, and the gap between your wiki and your source material widens. `llmwiki lint` gives you an immediate, no-LLM-required report on what's wrong; `llmwiki eval` goes further, producing a quantitative health score you can gate on in CI. -Running both regularly — especially after a `compile` or `refresh` — keeps your wiki trustworthy and prevents quality debt from quietly building up. +Running both regularly - especially after a `compile` or `refresh` - keeps your wiki trustworthy and prevents quality debt from quietly building up. --- @@ -38,8 +38,8 @@ llmwiki lint These two states are often confused: -- **Stale** — the source file still exists, but its content has changed since the last compile. The page may be out of date. -- **Orphaned** — every source that contributed to the page has been deleted. The page has no living owner. +- **Stale** - the source file still exists, but its content has changed since the last compile. The page may be out of date. +- **Orphaned** - every source that contributed to the page has been deleted. The page has no living owner. Both are surfaced by `llmwiki lint` and are also visible in the local viewer, JSON export, and MCP tools. Use `llmwiki refresh --stale` to repair stale and orphaned pages with a targeted recompile. @@ -53,7 +53,7 @@ After every run, lint writes a summary to `.llmwiki/last-lint.json`. The local v ### Recommendations -`llmwiki next` reads the lint output to surface actionable recommendations — for example, suggesting `llmwiki refresh --stale` when stale pages are detected, or pointing you toward a specific broken citation to fix. +`llmwiki next` reads the lint output to surface actionable recommendations - for example, suggesting `llmwiki refresh --stale` when stale pages are detected, or pointing you toward a specific broken citation to fix. --- @@ -74,7 +74,7 @@ llmwiki eval --suite full # fast + LLM-as-judge citation support | `full` | Everything in `fast`, plus LLM-as-judge citation support scoring for a sample of claim/source pairs | Yes | - The `fast` suite is safe to run in any environment — including CI pipelines with no API key configured. Reserve `--suite full` for periodic deep checks where you want LLM-judged support scores. + The `fast` suite is safe to run in any environment - including CI pipelines with no API key configured. Reserve `--suite full` for periodic deep checks where you want LLM-judged support scores. ### What eval measures @@ -98,7 +98,7 @@ The fraction of a page's valid sources that are actually cited somewhere in the The fraction of citations that are pinned to specific line ranges (e.g. `^[source.md:42-58]`) rather than citing a whole file. Higher means tighter, more verifiable provenance. **Corpus stats** -Page count, source count, total wiki characters, and embedding counts — appended to `.llmwiki/eval/history.jsonl` for trend tracking. +Page count, source count, total wiki characters, and embedding counts - appended to `.llmwiki/eval/history.jsonl` for trend tracking. **Regression deltas** The current run is diffed against the previous entry in `history.jsonl`. Improvements and regressions in every metric are displayed in the report. @@ -119,7 +119,7 @@ llmwiki eval cache clear # wipe the citation judgement cache ### CI thresholds -Add `.llmwiki/eval/thresholds.yaml` to your project to define minimum acceptable scores. When any threshold is violated, `llmwiki eval` exits with a non-zero code and lists the violations in the report — making it suitable for CI gating. +Add `.llmwiki/eval/thresholds.yaml` to your project to define minimum acceptable scores. When any threshold is violated, `llmwiki eval` exits with a non-zero code and lists the violations in the report - making it suitable for CI gating. ```yaml # .llmwiki/eval/thresholds.yaml @@ -155,7 +155,7 @@ Eval writes the following files under `.llmwiki/eval/`: ## llmwiki rules -The `rules` subcommands let you extract and manage machine-actionable rule candidates from your sources. Rule candidates are structured records that can be exported for a downstream rule importer — separate from the prose wiki pages produced by `compile`. +The `rules` subcommands let you extract and manage machine-actionable rule candidates from your sources. Rule candidates are structured records that can be exported for a downstream rule importer - separate from the prose wiki pages produced by `compile`. ```bash llmwiki rules extract # extract rule candidates from changed sources (requires LLM) @@ -167,7 +167,7 @@ llmwiki rules export --scope all # include proposed candidates too llmwiki rules export --scope proposed # only proposed candidates ``` -`rules extract` runs the LLM over your changed sources to identify rule-like statements — constraints, invariants, policies, and similar structured knowledge. Candidates land in `.llmwiki/rule-candidates/` as JSON records. `rules list` shows pending candidates with their confidence score and proposed title so you can decide what to approve. +`rules extract` runs the LLM over your changed sources to identify rule-like statements - constraints, invariants, policies, and similar structured knowledge. Candidates land in `.llmwiki/rule-candidates/` as JSON records. `rules list` shows pending candidates with their confidence score and proposed title so you can decide what to approve. All mutations (`approve`, `reject`, `extract`) run under `.llmwiki/lock` to serialize cleanly against concurrent operations. diff --git a/docs/cli/query.mdx b/docs/cli/query.mdx index 34a3135..43f1abb 100644 --- a/docs/cli/query.mdx +++ b/docs/cli/query.mdx @@ -1,10 +1,10 @@ --- -title: "llmwiki query — Ask Questions Against Your Compiled Wiki" +title: "llmwiki query - Ask Questions Against Your Compiled Wiki" sidebarTitle: "Query" description: "llmwiki query answers questions using hybrid semantic search and BM25 reranking over compiled wiki pages. Use --save to persist answers as new pages." --- -Once you've compiled a wiki, `llmwiki query` lets you ask natural language questions and get grounded, cited answers drawn from your compiled pages. Unlike sending a question directly to an LLM, `query` retrieves the most relevant content from your own wiki first and uses that as the grounding context — so answers cite specific pages and trace back to your original sources. +Once you've compiled a wiki, `llmwiki query` lets you ask natural language questions and get grounded, cited answers drawn from your compiled pages. Unlike sending a question directly to an LLM, `query` retrieves the most relevant content from your own wiki first and uses that as the grounding context - so answers cite specific pages and trace back to your original sources. The query pipeline runs in two steps: first it identifies the most relevant pages or chunks for your question using hybrid retrieval, then it generates a grounded answer using only the content those pages contain. If the wiki doesn't have enough information to answer your question, the model says so. @@ -38,7 +38,7 @@ llmwiki query "What is multi-head attention and how does it differ from self-att The final evidence pack is the full content of the selected pages, loaded and passed to the LLM as grounding context. When no embedding store is present, the command falls back to sending the full `wiki/index.md` to the LLM for page selection. -If you need wikilink-graph expansion — pulling in pages directly referenced by the top results — use [`llmwiki context`](#llmwiki-context-prompt---json) instead. `context` packages an extended evidence pack with graph neighbors and citation metadata, designed for agent pipelines that build their own prompts. +If you need wikilink-graph expansion - pulling in pages directly referenced by the top results - use [`llmwiki context`](#llmwiki-context-prompt---json) instead. `context` packages an extended evidence pack with graph neighbors and citation metadata, designed for agent pipelines that build their own prompts. ### Example session @@ -71,12 +71,12 @@ llmwiki context "" --json `llmwiki context` and `llmwiki query` both perform hybrid retrieval, but they produce different outputs: -- **`llmwiki context`** packages the evidence — primary pages, semantic chunks, graph neighbors, citations, per-page freshness warnings, and suggested next actions — into a structured evidence pack without generating an answer. The pack is designed to be fed directly to an LLM agent or MCP tool as a token-budgeted context window. +- **`llmwiki context`** packages the evidence - primary pages, semantic chunks, graph neighbors, citations, per-page freshness warnings, and suggested next actions - into a structured evidence pack without generating an answer. The pack is designed to be fed directly to an LLM agent or MCP tool as a token-budgeted context window. - **`llmwiki query`** generates a grounded answer using that same retrieved evidence. Use `llmwiki context` when you want to build your own prompt around the retrieved evidence, or when you're integrating wiki retrieval into an agent pipeline. Use `llmwiki query` when you want a ready-to-read answer. -`--json` emits the same `v1` JSON envelope as the MCP `get_context_pack` tool — stable format suitable for agent consumption. +`--json` emits the same `v1` JSON envelope as the MCP `get_context_pack` tool - stable format suitable for agent consumption. ### `--include-sources` @@ -84,7 +84,7 @@ Use `llmwiki context` when you want to build your own prompt around the retrieve llmwiki context "" --include-sources ``` -The `--include-sources` flag is opt-in and appends raw text windows from the ingested source files alongside the compiled wiki page content. Source windows are path-confined to the `sources/` directory — the flag cannot be used to read files outside that boundary. Only enable source windows for agents and pipelines you trust with the full text of your ingested sources. +The `--include-sources` flag is opt-in and appends raw text windows from the ingested source files alongside the compiled wiki page content. Source windows are path-confined to the `sources/` directory - the flag cannot be used to read files outside that boundary. Only enable source windows for agents and pipelines you trust with the full text of your ingested sources. ## Compounding queries with `--save` @@ -101,7 +101,7 @@ llmwiki query "What are the training techniques used for large language models?" llmwiki query "How does RLHF relate to the training techniques we documented?" ``` -Over time, a wiki that accumulates saved query answers becomes progressively richer — later questions can build on earlier ones, and the evidence pack grows to include synthesized knowledge alongside raw source-derived pages. +Over time, a wiki that accumulates saved query answers becomes progressively richer - later questions can build on earlier ones, and the evidence pack grows to include synthesized knowledge alongside raw source-derived pages. ### Example with `--debug` @@ -120,7 +120,7 @@ $ llmwiki query "How does positional encoding work?" --debug ``` -`llmwiki query` uses semantic search when an embedding store exists at `.llmwiki/embeddings.json`. If no embedding store is present — for example, when using a provider that doesn't support embeddings, such as GitHub Copilot — query falls back to lexical (full-index) ranking and surfaces a warning. Compile your wiki first to build the embedding store; it is generated automatically during `llmwiki compile`. +`llmwiki query` uses semantic search when an embedding store exists at `.llmwiki/embeddings.json`. If no embedding store is present - for example, when using a provider that doesn't support embeddings, such as GitHub Copilot - query falls back to lexical (full-index) ranking and surfaces a warning. Compile your wiki first to build the embedding store; it is generated automatically during `llmwiki compile`. For using `query` and `context` via an AI agent, see [MCP Agent Integration](/guides/mcp-agent-integration). diff --git a/docs/cli/review.mdx b/docs/cli/review.mdx index 118797a..953d0ac 100644 --- a/docs/cli/review.mdx +++ b/docs/cli/review.mdx @@ -1,10 +1,10 @@ --- -title: "Review Queue — Inspect and Approve Generated Wiki Pages" +title: "Review Queue - Inspect and Approve Generated Wiki Pages" sidebarTitle: "Review" description: "Use llmwiki compile --review and the review subcommands to inspect, approve, or reject generated pages before they land in your wiki." --- -By default, `llmwiki compile` writes pages directly to `wiki/`. That's fine for a personal knowledge base where you trust the compiler's output — but when you're building something authoritative, populating a shared wiki, or working with sources that contain conflicting or uncertain information, you may want to review each generated page before it goes live. +By default, `llmwiki compile` writes pages directly to `wiki/`. That's fine for a personal knowledge base where you trust the compiler's output - but when you're building something authoritative, populating a shared wiki, or working with sources that contain conflicting or uncertain information, you may want to review each generated page before it goes live. The review queue gives you that control. Instead of writing pages, compile deposits them as JSON candidate records in `.llmwiki/candidates/`. You inspect each one, then either approve it (which writes it to `wiki/` and refreshes the index) or reject it (which archives it without touching the wiki). The wiki only changes when you say so. @@ -21,13 +21,13 @@ llmwiki compile --review When the compile finishes, it reports how many pages were written and how many were held: ``` -Wrote 8 page(s), held 2 for review — run `llmwiki review list` +Wrote 8 page(s), held 2 for review - run `llmwiki review list` ``` Pages that were already live and haven't changed are not affected. Only newly generated or updated candidates are held. - `--review` is all-or-nothing on the command line — every generated page goes to the queue. For more granular control (e.g. automatically hold only low-confidence or contradicted pages while writing the rest live), see [Review Policy](/configuration/review-policy). + `--review` is all-or-nothing on the command line - every generated page goes to the queue. For more granular control (e.g. automatically hold only low-confidence or contradicted pages while writing the rest live), see [Review Policy](/configuration/review-policy). --- @@ -42,7 +42,7 @@ List all pending candidates: llmwiki review list ``` -Each row shows the candidate ID, slug, review mode, reason codes, generation timestamp, and contributing sources. The reason codes tell you why the page was held — useful when you're using a review policy that holds pages selectively. +Each row shows the candidate ID, slug, review mode, reason codes, generation timestamp, and contributing sources. The reason codes tell you why the page was held - useful when you're using a review policy that holds pages selectively. ### `llmwiki review show ` @@ -54,14 +54,14 @@ llmwiki review show The output includes: -- **Title, slug, and summary** — the page's metadata -- **Sources** — which source files contributed to this page -- **Review mode and reason codes** — why the page was held, with detail where available -- **Confidence** — the LLM-reported confidence score, if present -- **Contradiction flag** — whether the page declares `contradictedBy` entries -- **Full page body** — the complete markdown content, exactly as it would be written to `wiki/` -- **Schema violations** — any cross-link rules the page fails, if a schema is configured -- **Provenance violations** — any broken or malformed citation markers +- **Title, slug, and summary** - the page's metadata +- **Sources** - which source files contributed to this page +- **Review mode and reason codes** - why the page was held, with detail where available +- **Confidence** - the LLM-reported confidence score, if present +- **Contradiction flag** - whether the page declares `contradictedBy` entries +- **Full page body** - the complete markdown content, exactly as it would be written to `wiki/` +- **Schema violations** - any cross-link rules the page fails, if a schema is configured +- **Provenance violations** - any broken or malformed citation markers Use `review show` to read the proposed page before deciding to approve or reject it. @@ -83,10 +83,10 @@ Approval does the following in order: 6. Removes the candidate file from `.llmwiki/candidates/` 7. Records the approved slug in state so future compiles track it correctly -The page body stored in the candidate is written verbatim — approval never re-invokes the LLM. +The page body stored in the candidate is written verbatim - approval never re-invokes the LLM. - Embeddings refresh may fail if no provider credentials are configured at approval time. When that happens, a warning is printed but the approval still succeeds — the page is written and the index is updated. Re-run `llmwiki compile` or `llmwiki refresh --stale` later to pick up the missing embeddings. + Embeddings refresh may fail if no provider credentials are configured at approval time. When that happens, a warning is printed but the approval still succeeds - the page is written and the index is updated. Re-run `llmwiki compile` or `llmwiki refresh --stale` later to pick up the missing embeddings. ### `llmwiki review reject ` @@ -97,7 +97,7 @@ Archive a candidate without touching the wiki: llmwiki review reject ``` -Rejected candidates are moved to `.llmwiki/candidates/archive/`. They no longer appear in `review list`, but they remain on disk for audit purposes. A rejected candidate for an unchanged source won't be re-extracted on the next `compile` — the rejection is sticky until the source itself changes. +Rejected candidates are moved to `.llmwiki/candidates/archive/`. They no longer appear in `review list`, but they remain on disk for audit purposes. A rejected candidate for an unchanged source won't be re-extracted on the next `compile` - the rejection is sticky until the source itself changes. --- @@ -113,7 +113,7 @@ Every candidate records **why** it was held. The reason codes surface in `review | `schema-violating` | Page fails a schema cross-link rule | | `provenance-violating` | Page has broken or malformed citations | -When you're using a [review policy](/configuration/review-policy), only pages that trip an enabled reason code are held — the rest are written live. +When you're using a [review policy](/configuration/review-policy), only pages that trip an enabled reason code are held - the rest are written live. --- diff --git a/docs/cli/serve.mdx b/docs/cli/serve.mdx index 586989d..52c90d6 100644 --- a/docs/cli/serve.mdx +++ b/docs/cli/serve.mdx @@ -1,12 +1,12 @@ --- -title: "llmwiki serve — MCP Server for AI Agent Integration" +title: "llmwiki serve - MCP Server for AI Agent Integration" sidebarTitle: "Serve" description: "llmwiki serve starts an MCP server exposing the full compile pipeline to Claude Desktop, Cursor, Claude Code, and any MCP-compatible agent." --- `llmwiki serve` turns your compiled wiki into a live capability for AI agents. It starts an [MCP (Model Context Protocol)](https://modelcontextprotocol.io) server over stdio, exposing the full llmwiki pipeline as structured tools and read-only resources that any MCP-compatible client can call. -With the server running, an agent can ingest new sources, trigger a compile, query the wiki for grounded answers, retrieve citation-aware context packs, run quality checks, and read individual pages — all without touching the CLI or parsing terminal output. Read-only tools work immediately with no credentials; tools that call an LLM check for a configured provider on each invocation. +With the server running, an agent can ingest new sources, trigger a compile, query the wiki for grounded answers, retrieve citation-aware context packs, run quality checks, and read individual pages - all without touching the CLI or parsing terminal output. Read-only tools work immediately with no credentials; tools that call an LLM check for a configured provider on each invocation. --- @@ -17,7 +17,7 @@ llmwiki serve # serve the current directory llmwiki serve --root /path/to/wiki # serve a specific project root ``` -The server uses stdio transport. It starts immediately and doesn't require LLM credentials at startup — only tools that make LLM calls validate credentials lazily when they're invoked. +The server uses stdio transport. It starts immediately and doesn't require LLM credentials at startup - only tools that make LLM calls validate credentials lazily when they're invoked. --- @@ -39,7 +39,7 @@ To connect Claude Desktop, Cursor, or Claude Code, add llmwiki to your MCP clien } ``` -For Claude Desktop this goes in `claude_desktop_config.json`. For Cursor, add it to your MCP settings file. The `env` block is where you supply your LLM provider credentials — the server process inherits them from there. +For Claude Desktop this goes in `claude_desktop_config.json`. For Cursor, add it to your MCP settings file. The `env` block is where you supply your LLM provider credentials - the server process inherits them from there. If you're using the `claude-agent` provider with a local Claude Code login, you can omit `ANTHROPIC_API_KEY` from the `env` block. Set `LLMWIKI_PROVIDER=claude-agent` instead and the server will authenticate through your existing Claude Code session. @@ -100,12 +100,12 @@ The server exposes 7 read-only resources under the `llmwiki://` URI scheme. MCP | URI | Returns | |-----|---------| -| `llmwiki://index` | Full content of `wiki/index.md` — the auto-generated table of contents. | -| `llmwiki://concept/{slug}` | A single concept page from `wiki/concepts/` — parsed frontmatter plus body. | -| `llmwiki://query/{slug}` | A single saved query page from `wiki/queries/` — parsed frontmatter plus body. | +| `llmwiki://index` | Full content of `wiki/index.md` - the auto-generated table of contents. | +| `llmwiki://concept/{slug}` | A single concept page from `wiki/concepts/` - parsed frontmatter plus body. | +| `llmwiki://query/{slug}` | A single saved query page from `wiki/queries/` - parsed frontmatter plus body. | | `llmwiki://sources` | List of ingested source files with frontmatter metadata (filename, truncation flag, source URL, etc.). | -| `llmwiki://state` | Compilation state from `.llmwiki/state.json` — per-source content hashes, live concept slugs, and last compile times. | -| `llmwiki://eval/report` | The most recent eval report — health score, citation coverage, corpus stats. | +| `llmwiki://state` | Compilation state from `.llmwiki/state.json` - per-source content hashes, live concept slugs, and last compile times. | +| `llmwiki://eval/report` | The most recent eval report - health score, citation coverage, corpus stats. | | `llmwiki://eval/history` | Trend of past eval runs from `history.jsonl`. | --- @@ -119,7 +119,7 @@ llmwiki next # print a human-readable recommendation llmwiki next --json # emit a stable JSON envelope for agent consumption ``` -`llmwiki next` reads the current project state — including the lint cache, pending source changes, and review queue depth — and recommends the single most useful next action. With `--json`, the output is a stable envelope with a `command` field, an `args` array, and a `reason` string, so an agent can execute the recommendation programmatically without parsing prose. +`llmwiki next` reads the current project state - including the lint cache, pending source changes, and review queue depth - and recommends the single most useful next action. With `--json`, the output is a stable envelope with a `command` field, an `args` array, and a `reason` string, so an agent can execute the recommendation programmatically without parsing prose. `llmwiki next` is read-only. It never modifies the workspace. It's safe to call as a status check at any point in an agent workflow. @@ -127,4 +127,4 @@ llmwiki next --json # emit a stable JSON envelope for agent consumption --- -For a full walkthrough of connecting llmwiki to Claude Desktop, Cursor, and Claude Code — including multi-project setups and agent workflow patterns — see the [MCP Agent Integration guide](/guides/mcp-agent-integration). +For a full walkthrough of connecting llmwiki to Claude Desktop, Cursor, and Claude Code - including multi-project setups and agent workflow patterns - see the [MCP Agent Integration guide](/guides/mcp-agent-integration). diff --git a/docs/cli/view.mdx b/docs/cli/view.mdx index b26c7d0..f3322ee 100644 --- a/docs/cli/view.mdx +++ b/docs/cli/view.mdx @@ -1,10 +1,10 @@ --- -title: "llmwiki view — Browse Your Wiki in a Local Web Viewer" +title: "llmwiki view - Browse Your Wiki in a Local Web Viewer" sidebarTitle: "View" description: "llmwiki view starts a private local web server for browsing, searching, and inspecting your compiled wiki with provenance chips and a graph view." --- -After compiling your wiki, `llmwiki view` gives you a private local web interface for browsing, searching, and inspecting what was generated. The viewer renders your compiled Markdown pages with their frontmatter metadata, shows citation and provenance chips for each claim, provides a full-text search across all pages, and includes a force-directed graph at `#/graph` for exploring how concepts link to each other. Everything runs locally — no data leaves your machine. +After compiling your wiki, `llmwiki view` gives you a private local web interface for browsing, searching, and inspecting what was generated. The viewer renders your compiled Markdown pages with their frontmatter metadata, shows citation and provenance chips for each claim, provides a full-text search across all pages, and includes a force-directed graph at `#/graph` for exploring how concepts link to each other. Everything runs locally - no data leaves your machine. ## Starting the viewer @@ -41,7 +41,7 @@ Press `Ctrl+C` to stop the server. **Markdown rendering.** Pages are rendered with full Markdown formatting, including headers, code blocks, tables, and `[[wikilink]]` resolution. Wikilinks are clickable and resolve to any page that matches either the linked title or an entry in the page's `aliases` frontmatter. -**Page metadata.** Each page shows its frontmatter fields — kind, contributing sources, confidence score, provenance state, creation and update timestamps — in a collapsible metadata panel. +**Page metadata.** Each page shows its frontmatter fields - kind, contributing sources, confidence score, provenance state, creation and update timestamps - in a collapsible metadata panel. **Health counts.** The sidebar header shows the current count of `STALE`, `ORPHANED`, `CONTRADICTED`, and `ARCHIVED` pages from the most recent lint run, giving you an at-a-glance freshness signal without leaving the viewer. @@ -53,7 +53,7 @@ Press `Ctrl+C` to stop the server. ## Security model -The viewer is **read-only by design** — it renders compiled wiki pages but cannot write to `sources/`, `wiki/`, or `.llmwiki/`. All write operations go through the CLI or the MCP server. +The viewer is **read-only by design** - it renders compiled wiki pages but cannot write to `sources/`, `wiki/`, or `.llmwiki/`. All write operations go through the CLI or the MCP server. By default, the server binds exclusively to `127.0.0.1` (loopback), so it is only accessible from your local machine. Viewer responses use a strict local-asset Content Security Policy that prevents the UI from loading resources from external origins. @@ -75,7 +75,7 @@ bind beyond loopback, or neither to keep the viewer on 127.0.0.1. Wildcard hosts (`0.0.0.0`, `::`, `*`) are rejected regardless of `--allow-lan`. Use a specific interface IP instead. -LAN mode exposes your compiled wiki — including all page content, citations, and source references — to every device that can reach the specified IP on your network. Only enable LAN mode for devices and agents you trust with the contents of your wiki. If you need to share a wiki externally, export it with `llmwiki export` instead and serve the static output through a controlled channel. +LAN mode exposes your compiled wiki - including all page content, citations, and source references - to every device that can reach the specified IP on your network. Only enable LAN mode for devices and agents you trust with the contents of your wiki. If you need to share a wiki externally, export it with `llmwiki export` instead and serve the static output through a controlled channel. ## Example workflow @@ -93,7 +93,7 @@ llmwiki lint # Repair stale pages llmwiki refresh --stale -# Reload the viewer — it serves the latest compiled state on each page load +# Reload the viewer - it serves the latest compiled state on each page load ``` diff --git a/docs/concepts/citations.mdx b/docs/concepts/citations.mdx index c17bd94..8b79625 100644 --- a/docs/concepts/citations.mdx +++ b/docs/concepts/citations.mdx @@ -4,7 +4,7 @@ sidebarTitle: "Citations" description: "llmwiki traces every paragraph and claim back to the source file and line range. Learn how paragraph and claim-level citations work." --- -When you compile a wiki, you are trusting an LLM to synthesize knowledge from your sources. That trust is only well-placed if you can trace every claim back to where it came from. llmwiki builds provenance tracing into the page format itself: paragraphs carry lightweight citation markers pointing back to the source file that contributed them, and specific claims can pin to exact line ranges within that file. This means you can open any compiled page, see a citation marker, and know precisely which source — and which lines of that source — the content derives from. `llmwiki lint` validates every citation on every run, and `llmwiki eval` measures how thoroughly and accurately your pages are cited. +When you compile a wiki, you are trusting an LLM to synthesize knowledge from your sources. That trust is only well-placed if you can trace every claim back to where it came from. llmwiki builds provenance tracing into the page format itself: paragraphs carry lightweight citation markers pointing back to the source file that contributed them, and specific claims can pin to exact line ranges within that file. This means you can open any compiled page, see a citation marker, and know precisely which source - and which lines of that source - the content derives from. `llmwiki lint` validates every citation on every run, and `llmwiki eval` measures how thoroughly and accurately your pages are cited. ## Paragraph-Level Citations @@ -18,11 +18,11 @@ The two-phase compile pipeline separates concept extraction from page generation so that cross-source merges happen deterministically. ^[architecture-notes.md] ``` -The filename inside `^[...]` is relative to the `sources/` directory. You do not include the `sources/` prefix — just the bare filename as it appears in the `sources` frontmatter field. +The filename inside `^[...]` is relative to the `sources/` directory. You do not include the `sources/` prefix - just the bare filename as it appears in the `sources` frontmatter field. ## Claim-Level Citations -For claims that require tighter verification — specific numbers, precise technical assertions, direct quotations — you can pin a citation to a line range within the source file. llmwiki supports two equivalent syntaxes: +For claims that require tighter verification - specific numbers, precise technical assertions, direct quotations - you can pin a citation to a line range within the source file. llmwiki supports two equivalent syntaxes: ```markdown Colon range syntax @@ -34,7 +34,7 @@ The system uses a two-phase compile pipeline. ^[architecture-notes.md#L42-L58] ``` -Both forms identify the same span: lines 42 through 58 (inclusive) of `architecture-notes.md` in the `sources/` directory. Use whichever form your team prefers — llmwiki's linter and eval harness treat them identically. +Both forms identify the same span: lines 42 through 58 (inclusive) of `architecture-notes.md` in the `sources/` directory. Use whichever form your team prefers - llmwiki's linter and eval harness treat them identically. Claim-level citations are tracked by `llmwiki eval` as the `claim_level_citation_rate` metric: the fraction of all citations in the wiki that pin to a specific line range rather than a whole file. Higher rates mean tighter, more verifiable provenance. You can set a minimum threshold in `.llmwiki/eval/thresholds.yaml`. @@ -49,7 +49,7 @@ Both forms identify the same span: lines 42 through 58 (inclusive) of `architect The filename inside `^[...]` does not exist in `sources/`. This happens when a source is deleted after compilation or when the LLM hallucinated a filename. Treated as an **error**. - The citation syntax is not parseable — for example, `^[file.md:abc]` where the range is not a pair of integers. Treated as an **error**. + The citation syntax is not parseable - for example, `^[file.md:abc]` where the range is not a pair of integers. Treated as an **error**. A line range where the start line is `0` (lines are 1-indexed), or where the end line is less than the start line (e.g. `^[file.md:8-3]`). Treated as an **error**. @@ -115,10 +115,10 @@ Related concepts: [[Compilation Pipeline]], [[Incremental Compilation]], [[Wikil `llmwiki eval` measures citation quality across the entire wiki as part of its health score: -- **Citation coverage** — the fraction of prose paragraphs in `wiki/concepts/` that carry at least one `^[...]` marker. Low coverage means paragraphs are floating without provenance. -- **Citation precision** — the fraction of `^[...]` markers that point to a source file that actually exists in `sources/`. A precision below 100% indicates hallucinated or deleted source references. -- **Citation support** (`--suite full`) — samples up to N `(claim, source span)` pairs and asks a judge model to score each 0–2 (unsupported → fully supported). Results are cached in `.llmwiki/eval/citation-cache.jsonl` so re-runs only judge new pairs. -- **Claim-level citation rate** — the fraction of all citations that use a line-range form rather than a bare filename. +- **Citation coverage** - the fraction of prose paragraphs in `wiki/concepts/` that carry at least one `^[...]` marker. Low coverage means paragraphs are floating without provenance. +- **Citation precision** - the fraction of `^[...]` markers that point to a source file that actually exists in `sources/`. A precision below 100% indicates hallucinated or deleted source references. +- **Citation support** (`--suite full`) - samples up to N `(claim, source span)` pairs and asks a judge model to score each 0–2 (unsupported → fully supported). Results are cached in `.llmwiki/eval/citation-cache.jsonl` so re-runs only judge new pairs. +- **Claim-level citation rate** - the fraction of all citations that use a line-range form rather than a bare filename. You can set minimum thresholds for all of these in `.llmwiki/eval/thresholds.yaml` to gate CI pipelines on citation quality. diff --git a/docs/concepts/how-it-works.mdx b/docs/concepts/how-it-works.mdx index 0ee5a48..4b43f8c 100644 --- a/docs/concepts/how-it-works.mdx +++ b/docs/concepts/how-it-works.mdx @@ -4,7 +4,7 @@ sidebarTitle: "How It Works" description: "Understand llmwiki's two-phase LLM pipeline: concept extraction, page generation, incremental change detection, and hybrid retrieval." --- -Most knowledge management tools retrieve information at query time — every question re-discovers the same relationships from scratch, and the structure never accumulates. llmwiki takes the opposite approach: it compiles your sources into a persistent, interlinked wiki artifact **before** any query runs. Concepts get their own typed pages. Content shared across multiple sources is merged into one page rather than competing as duplicate chunks. Pages link to each other via `[[wikilinks]]`. When you query with `--save`, the answer becomes a new page and future queries use it as context. Embeddings, BM25 reranking, and wikilink-graph expansion then run over this compiled artifact, narrowing hundreds of pages to a tight, citation-traceable evidence pack. +Most knowledge management tools retrieve information at query time - every question re-discovers the same relationships from scratch, and the structure never accumulates. llmwiki takes the opposite approach: it compiles your sources into a persistent, interlinked wiki artifact **before** any query runs. Concepts get their own typed pages. Content shared across multiple sources is merged into one page rather than competing as duplicate chunks. Pages link to each other via `[[wikilinks]]`. When you query with `--save`, the answer becomes a new page and future queries use it as context. Embeddings, BM25 reranking, and wikilink-graph expansion then run over this compiled artifact, narrowing hundreds of pages to a tight, citation-traceable evidence pack. ## The Compile Pipeline @@ -25,10 +25,10 @@ sources/ → hash check → LLM concept extraction → page generation llmwiki splits compilation into two distinct phases rather than processing each source end-to-end. - - Every changed source is sent to the LLM, which identifies and extracts the key concepts each source contains. All extractions complete before any page is written. This means the compiler knows the full concept universe — including which concepts appear in multiple sources — before it commits to writing a single file. + + Every changed source is sent to the LLM, which identifies and extracts the key concepts each source contains. All extractions complete before any page is written. This means the compiler knows the full concept universe - including which concepts appear in multiple sources - before it commits to writing a single file. - + For each extracted concept, the LLM generates a structured wiki page with YAML frontmatter, prose body, and `[[wikilinks]]` to related pages. Concepts claimed by more than one source are merged into a single page at this stage instead of producing duplicate files. @@ -41,7 +41,7 @@ llmwiki avoids re-processing unchanged work at every layer of the pipeline. - Every file in `sources/` is SHA-256 hashed and compared against `.llmwiki/state.json`. Only sources whose hash changed — or that are brand new — flow through the LLM pipeline. + Every file in `sources/` is SHA-256 hashed and compared against `.llmwiki/state.json`. Only sources whose hash changed - or that are brand new - flow through the LLM pipeline. Chunk embeddings in `.llmwiki/embeddings.json` are content-hash-aware. Re-running on an unchanged corpus skips all embedding work. @@ -70,15 +70,15 @@ Retrieval runs over the compiled wiki, not raw source chunks. - When no embedding store is present — for example, when using the GitHub Copilot provider, which has no embeddings endpoint — llmwiki automatically falls back to lexical (index-based) ranking and surfaces a stable warning code rather than hard-failing. + When no embedding store is present - for example, when using the GitHub Copilot provider, which has no embeddings endpoint - llmwiki automatically falls back to lexical (index-based) ranking and surfaces a stable warning code rather than hard-failing. ## Source Freshness -Every compiled page records the sources — and their SHA-256 content hashes — that produced it. On any later command, llmwiki compares those recorded hashes against `sources/` on disk: +Every compiled page records the sources - and their SHA-256 content hashes - that produced it. On any later command, llmwiki compares those recorded hashes against `sources/` on disk: -- **Stale** — a page whose recorded source hashes no longer match the files on disk. The source still exists, but its content changed since the last compile. -- **Orphaned** — a page whose recorded sources were all deleted from `sources/`. +- **Stale** - a page whose recorded source hashes no longer match the files on disk. The source still exists, but its content changed since the last compile. +- **Orphaned** - a page whose recorded sources were all deleted from `sources/`. `llmwiki lint`, `llmwiki status`, the local viewer, the JSON export, and the MCP `wiki_status` tool all surface stale and orphaned pages without triggering a recompile. To repair them, run `llmwiki refresh --stale`: it recompiles only the changed sources that own stale pages and cleans up orphaned pages, deliberately leaving unrelated new sources for a full `llmwiki compile`. Use `--dry-run` to preview the repair plan with no LLM calls or writes. diff --git a/docs/concepts/karpathy-pattern.mdx b/docs/concepts/karpathy-pattern.mdx index 86309ff..bb6c0c6 100644 --- a/docs/concepts/karpathy-pattern.mdx +++ b/docs/concepts/karpathy-pattern.mdx @@ -40,12 +40,12 @@ llmwiki makes the pattern operational with a two-phase compile pipeline: Compiling first creates leverage that raw retrieval does not naturally provide: -- **Durability** — pages remain on disk, can be inspected, reviewed, edited, exported, and versioned. -- **Accumulation** — saved query answers and newly ingested sources enrich the wiki over time. -- **Citation tracing** — paragraphs and claims point back to source files and line ranges. -- **Reviewability** — risky pages can be held as candidates before they become live context. -- **Freshness tracking** — source changes can mark compiled pages stale or orphaned. -- **Interoperability** — compiled knowledge can be exported to JSON, JSON-LD, GraphML, Marp, `llms.txt`, and Open Knowledge Format. +- **Durability** - pages remain on disk, can be inspected, reviewed, edited, exported, and versioned. +- **Accumulation** - saved query answers and newly ingested sources enrich the wiki over time. +- **Citation tracing** - paragraphs and claims point back to source files and line ranges. +- **Reviewability** - risky pages can be held as candidates before they become live context. +- **Freshness tracking** - source changes can mark compiled pages stale or orphaned. +- **Interoperability** - compiled knowledge can be exported to JSON, JSON-LD, GraphML, Marp, `llms.txt`, and Open Knowledge Format. llmwiki still uses retrieval. The difference is that retrieval runs over the compiled wiki: semantic chunk search, BM25 reranking, and wikilink-graph expansion select evidence from a structured artifact. diff --git a/docs/concepts/page-types.mdx b/docs/concepts/page-types.mdx index 19fd1e7..035155f 100644 --- a/docs/concepts/page-types.mdx +++ b/docs/concepts/page-types.mdx @@ -1,10 +1,10 @@ --- title: "Wiki Page Kinds: concept, entity, comparison, overview" sidebarTitle: "Page Types" -description: "Learn the four page kinds — concept, entity, comparison, overview — their lint expectations, and the epistemic metadata every compiled page can carry." +description: "Learn the four page kinds - concept, entity, comparison, overview - their lint expectations, and the epistemic metadata every compiled page can carry." --- -Without typed pages, a compiled wiki quickly becomes a flat list of topics with no signal about what role each page plays. A page about "Transformer" could be a conceptual explanation, a specific model card for GPT-4, a side-by-side of Transformer vs. RNN, or a domain map tying together all attention-related pages. llmwiki's four page kinds give the compiler — and you — a vocabulary for these distinctions. Each kind carries different linting expectations (such as minimum wikilink counts you can configure in `schema.json`), surfaces differently in the local viewer, and participates differently in the review policy. +Without typed pages, a compiled wiki quickly becomes a flat list of topics with no signal about what role each page plays. A page about "Transformer" could be a conceptual explanation, a specific model card for GPT-4, a side-by-side of Transformer vs. RNN, or a domain map tying together all attention-related pages. llmwiki's four page kinds give the compiler - and you - a vocabulary for these distinctions. Each kind carries different linting expectations (such as minimum wikilink counts you can configure in `schema.json`), surfaces differently in the local viewer, and participates differently in the review policy. ## The Four Page Kinds @@ -15,7 +15,7 @@ Without typed pages, a compiled wiki quickly becomes a flat list of topics with **Examples:** `self-attention`, `knowledge-compilation`, `incremental-compilation` - A specific, named thing that exists in the world — a person, organization, product, model, or artifact. Entity pages are about *particular instances* rather than abstract patterns. They carry unique identifying information and often link outward to the concepts they instantiate. + A specific, named thing that exists in the world - a person, organization, product, model, or artifact. Entity pages are about *particular instances* rather than abstract patterns. They carry unique identifying information and often link outward to the concepts they instantiate. **Examples:** `andrej-karpathy`, `gpt-4`, `anthropic`, `attention-is-all-you-need` @@ -59,15 +59,15 @@ updatedAt: "2026-04-05T12:00:00Z" - A number between `0` and `1` representing the LLM's reported confidence in the synthesized page. Higher values indicate the page is well-supported by its sources with minimal ambiguity. Lower values — by default, below `0.5` — cause `llmwiki lint` to flag the page with a `low-confidence` warning and, if the review policy is active, hold the page for review rather than writing it to `wiki/`. + A number between `0` and `1` representing the LLM's reported confidence in the synthesized page. Higher values indicate the page is well-supported by its sources with minimal ambiguity. Lower values - by default, below `0.5` - cause `llmwiki lint` to flag the page with a `low-confidence` warning and, if the review policy is active, hold the page for review rather than writing it to `wiki/`. A categorical label describing how the page's content was produced: - - `extracted` — derived directly from one source - - `merged` — synthesized from multiple sources - - `inferred` — the LLM drew a conclusion not stated explicitly in any source - - `ambiguous` — sources gave conflicting signals; the page reflects the compiler's best synthesis + - `extracted` - derived directly from one source + - `merged` - synthesized from multiple sources + - `inferred` - the LLM drew a conclusion not stated explicitly in any source + - `ambiguous` - sources gave conflicting signals; the page reflects the compiler's best synthesis When multiple sources merge into one slug, the reconciled state is always `merged`. @@ -90,16 +90,16 @@ The review policy in `.llmwiki/config.json` uses epistemic metadata to decide wh - Pages with `confidence` below `lowConfidenceThreshold` (default `0.5`) are held. Pages with no `confidence` field are also held by default — set `"treatMissingConfidenceAs": "ok"` in `config.json` to let them pass. + Pages with `confidence` below `lowConfidenceThreshold` (default `0.5`) are held. Pages with no `confidence` field are also held by default - set `"treatMissingConfidenceAs": "ok"` in `config.json` to let them pass. Pages with a non-empty `contradictedBy` array are held, regardless of their confidence score. - Pages that fail a schema cross-link rule — for example, a `concept` page with fewer wikilinks than the schema's `minWikilinks` requires — are held. + Pages that fail a schema cross-link rule - for example, a `concept` page with fewer wikilinks than the schema's `minWikilinks` requires - are held. - Pages with broken or malformed citations — source files that don't exist, line ranges that are impossible or out of bounds — are held. + Pages with broken or malformed citations - source files that don't exist, line ranges that are impossible or out of bounds - are held. diff --git a/docs/concepts/wiki-model.mdx b/docs/concepts/wiki-model.mdx index 3913071..a9e0190 100644 --- a/docs/concepts/wiki-model.mdx +++ b/docs/concepts/wiki-model.mdx @@ -4,7 +4,7 @@ sidebarTitle: "Wiki Model" description: "Learn how llmwiki organizes compiled wiki pages, wikilinks, the activity journal, and the .llmwiki directory for embeddings and state." --- -When llmwiki compiles your sources, it writes everything into a predictable, file-system-native layout. The compiled wiki lives in `wiki/`, the compiler's internal state lives in `.llmwiki/`, and every operation — ingest, compile, query — appends a timestamped entry to `log.md`. All of these are plain text or JSON files you can read, version-control, and open in any tool, including Obsidian. +When llmwiki compiles your sources, it writes everything into a predictable, file-system-native layout. The compiled wiki lives in `wiki/`, the compiler's internal state lives in `.llmwiki/`, and every operation - ingest, compile, query - appends a timestamped entry to `log.md`. All of these are plain text or JSON files you can read, version-control, and open in any tool, including Obsidian. ## Output Directory Structure @@ -34,7 +34,7 @@ wiki/ One Markdown file per compiled concept. Each file has YAML frontmatter with title, summary, kind, sources, timestamps, and optional epistemic metadata. The file body contains prose paragraphs with `^[source.md]` citation markers and `[[wikilinks]]` to related concepts. - Saved answers from `llmwiki query --save`. Query pages are full wiki pages and participate in retrieval — future queries use them as context, compounding the wiki's usefulness over time. + Saved answers from `llmwiki query --save`. Query pages are full wiki pages and participate in retrieval - future queries use them as context, compounding the wiki's usefulness over time. An auto-generated table of contents rebuilt after every compile. Lists every concept page with its summary, grouped for navigation. Both the CLI and the local viewer use this as the primary entry point. @@ -48,10 +48,10 @@ wiki/ Tracks per-source SHA-256 content hashes and the concept slugs each source owns. On every compile, llmwiki compares live file hashes against this state to determine what needs reprocessing. This is what makes incremental compilation possible. - The v2 embedding store. Carries page-level and chunk-level vectors used by `llmwiki query` and `llmwiki context` for cosine-similarity retrieval. Updated incrementally alongside source changes — only chunks whose content changed are re-embedded. + The v2 embedding store. Carries page-level and chunk-level vectors used by `llmwiki query` and `llmwiki context` for cosine-similarity retrieval. Updated incrementally alongside source changes - only chunks whose content changed are re-embedded. - JSON records for pages held for review — either via `llmwiki compile --review`, triggered automatically by a review policy in `config.json`, or staged by `llmwiki import --okf`. Each candidate records exactly why it was held (low confidence, contradicted, schema-violating, provenance-violating, or imported from OKF). Rejected candidates move to `candidates/archive/` for audit. + JSON records for pages held for review - either via `llmwiki compile --review`, triggered automatically by a review policy in `config.json`, or staged by `llmwiki import --okf`. Each candidate records exactly why it was held (low confidence, contradicted, schema-violating, provenance-violating, or imported from OKF). Rejected candidates move to `candidates/archive/` for audit. An optional file you create with `llmwiki schema init`. Defines which page kinds are permitted, per-kind minimum wikilink counts, and seed pages the compiler should materialize (such as domain-level overview pages). Projects without a schema file fall back to the `concept` kind for all pages. @@ -79,7 +79,7 @@ aliases: Any `[[multi-head self-attention]]` or `[[MHA]]` link in the wiki resolves to this page, even if the slug is `multi-head-attention`. Alias resolution is honored by the local viewer, the MCP `read_page` tool, and `llmwiki query`. - The wiki is fully Obsidian-compatible. Open the `wiki/` directory as an Obsidian vault to browse compiled pages, follow wikilinks, and view the knowledge graph — no additional configuration required. + The wiki is fully Obsidian-compatible. Open the `wiki/` directory as an Obsidian vault to browse compiled pages, follow wikilinks, and view the knowledge graph - no additional configuration required. ## Source Attribution @@ -98,7 +98,7 @@ updatedAt: "2026-04-05T12:00:00Z" --- ``` -The `sources` field lists the filenames from `sources/` that contributed to this page. These filenames — combined with the content hashes recorded in `.llmwiki/state.json` — are what llmwiki compares on subsequent runs to determine whether a page is fresh, stale, or orphaned. +The `sources` field lists the filenames from `sources/` that contributed to this page. These filenames - combined with the content hashes recorded in `.llmwiki/state.json` - are what llmwiki compares on subsequent runs to determine whether a page is fresh, stale, or orphaned. When multiple sources merge into one page, all contributing source filenames appear in the `sources` array. @@ -114,7 +114,7 @@ This imported provenance is used for honest re-export: `llmwiki export --target ## The Activity Journal (`log.md`) -Every ingest, compile, and query operation appends a timestamped entry to `log.md` at the project root. Entries use a fixed heading format — `## [YYYY-MM-DDThh:mm:ssZ] operation | description` — followed by a short bullet body carrying page wikilinks and counts: +Every ingest, compile, and query operation appends a timestamped entry to `log.md` at the project root. Entries use a fixed heading format - `## [YYYY-MM-DDThh:mm:ssZ] operation | description` - followed by a short bullet body carrying page wikilinks and counts: ```markdown ## [2026-06-05T09:14:02Z] ingest | Attention Is All You Need @@ -137,7 +137,7 @@ Because only headings start with `## [`, you can reliably extract recent operati grep "^## \[" log.md | tail -5 ``` -`log.md` tracks temporal progression — when things were compiled and in what order. `wiki/index.md` organizes content for discovery. Both are human-readable and machine-parseable. +`log.md` tracks temporal progression - when things were compiled and in what order. `wiki/index.md` organizes content for discovery. Both are human-readable and machine-parseable. `log.md` is a useful audit trail when running llmwiki through the MCP server or SDK. Agents can read it to understand what has already been compiled, what was recently updated, and which pages were created from a given source. diff --git a/docs/configuration/environment-variables.mdx b/docs/configuration/environment-variables.mdx index b0f246f..a09bed2 100644 --- a/docs/configuration/environment-variables.mdx +++ b/docs/configuration/environment-variables.mdx @@ -6,7 +6,7 @@ description: "Complete reference for all llmwiki environment variables: provider llmwiki reads configuration from environment variables and an optional `.env` file placed in your project directory (alongside `sources/` and `wiki/`). Variables set in your shell take precedence over those in `.env`, which in turn take precedence over the Claude Code settings fallback (`~/.claude/settings.json` → `env` block) and built-in defaults. -You don't need to set everything — only the variables relevant to your chosen provider are required. Most projects need only two or three exports before running `llmwiki compile`. +You don't need to set everything - only the variables relevant to your chosen provider are required. Most projects need only two or three exports before running `llmwiki compile`. --- @@ -28,7 +28,7 @@ You don't need to set everything — only the variables relevant to your chosen | `ANTHROPIC_AUTH_TOKEN` | One of the two | Alternative auth token accepted by Anthropic-compatible gateways | | `ANTHROPIC_BASE_URL` | No | Custom base URL for proxies or alternate Claude endpoints. Accepts HTTP(S) URLs including Claude-style path endpoints | -Either `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN` satisfies authentication — you do not need both. If neither is set in your environment or `.env`, llmwiki attempts to read these values from `~/.claude/settings.json`. +Either `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN` satisfies authentication - you do not need both. If neither is set in your environment or `.env`, llmwiki attempts to read these values from `~/.claude/settings.json`. --- @@ -102,7 +102,7 @@ Non-numeric, zero, or negative values are silently ignored and the next source i Place this file in your project root (the same directory that contains `sources/` and `wiki/`): ```bash -# .env — llmwiki project configuration +# .env - llmwiki project configuration # Provider LLMWIKI_PROVIDER=anthropic diff --git a/docs/configuration/providers.mdx b/docs/configuration/providers.mdx index 7b5d1e0..441384e 100644 --- a/docs/configuration/providers.mdx +++ b/docs/configuration/providers.mdx @@ -4,14 +4,14 @@ sidebarTitle: "Providers" description: "Configure llmwiki to use Anthropic, OpenAI-compatible endpoints, Ollama, GitHub Copilot, or the Claude Agent SDK provider for local login." --- -llmwiki is provider-portable. Whether you have an Anthropic API key, a GitHub Copilot subscription, a locally-running Ollama server, or just a Claude Code login, you can point llmwiki at the right backend with a handful of environment variables — no config files required for most setups. Choose the provider that matches your existing credentials and infrastructure. +llmwiki is provider-portable. Whether you have an Anthropic API key, a GitHub Copilot subscription, a locally-running Ollama server, or just a Claude Code login, you can point llmwiki at the right backend with a handful of environment variables - no config files required for most setups. Choose the provider that matches your existing credentials and infrastructure. ## Configuration precedence When you use the Anthropic provider (the default), llmwiki resolves credentials in this order: 1. Shell environment variables or a `.env` file in your project directory -2. Claude Code settings fallback — `~/.claude/settings.json` → `env` block +2. Claude Code settings fallback - `~/.claude/settings.json` → `env` block 3. Built-in provider defaults (where applicable) This means that if you already have Claude Code configured on your machine, you can run `llmwiki compile` without exporting a single variable. @@ -25,13 +25,13 @@ The Anthropic provider uses the official `@anthropic-ai/sdk` to call Claude dire **Authentication** -Set either `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN` — either one satisfies authentication. You do not need both. +Set either `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN` - either one satisfies authentication. You do not need both. | Variable | Purpose | |---|---| | `ANTHROPIC_API_KEY` | Standard Anthropic API key | | `ANTHROPIC_AUTH_TOKEN` | Alternative auth token (accepted by Anthropic-compatible gateways) | -| `ANTHROPIC_BASE_URL` | Optional — custom endpoint for proxies or alternate Claude gateways | +| `ANTHROPIC_BASE_URL` | Optional - custom endpoint for proxies or alternate Claude gateways | `ANTHROPIC_BASE_URL` accepts any valid HTTP or HTTPS URL. Claude-style path endpoints such as `https://api.example.com/coding/` are supported; trailing slashes are normalized automatically. @@ -65,7 +65,7 @@ llmwiki compile -The `claude-agent` provider routes calls through the [Claude Agent SDK](https://github.com/anthropics/claude-agent-sdk-typescript) instead of the raw Messages API. It authenticates using your **local Claude Code login** (OAuth/subscription) — no `ANTHROPIC_API_KEY` is required. If you can run `claude` in your terminal, this provider works. +The `claude-agent` provider routes calls through the [Claude Agent SDK](https://github.com/anthropics/claude-agent-sdk-typescript) instead of the raw Messages API. It authenticates using your **local Claude Code login** (OAuth/subscription) - no `ANTHROPIC_API_KEY` is required. If you can run `claude` in your terminal, this provider works. **Setup** @@ -228,10 +228,10 @@ A truncation warning prints to stderr when the cap fires, naming the concept tha ## Output language -Generated wiki content defaults to whatever language the model produces from the source material — typically English. You can override this two ways: +Generated wiki content defaults to whatever language the model produces from the source material - typically English. You can override this two ways: -- `LLMWIKI_OUTPUT_LANG` — applies to every prompt the compile and query pipelines make. For example: `zh-CN`, `Chinese`, `ja`, `Japanese`. -- `--lang ` on `llmwiki compile` or `llmwiki query` — same effect, scoped to one invocation. Wins over the env var. +- `LLMWIKI_OUTPUT_LANG` - applies to every prompt the compile and query pipelines make. For example: `zh-CN`, `Chinese`, `ja`, `Japanese`. +- `--lang ` on `llmwiki compile` or `llmwiki query` - same effect, scoped to one invocation. Wins over the env var. ```bash export LLMWIKI_OUTPUT_LANG=zh-CN diff --git a/docs/configuration/review-policy.mdx b/docs/configuration/review-policy.mdx index 9e95f0b..3bb201e 100644 --- a/docs/configuration/review-policy.mdx +++ b/docs/configuration/review-policy.mdx @@ -1,10 +1,10 @@ --- -title: "Review Policy — Auto-Hold Risky Generated Pages" +title: "Review Policy - Auto-Hold Risky Generated Pages" sidebarTitle: "Review Policy" description: "Configure .llmwiki/config.json to automatically hold low-confidence, contradicted, or schema-violating pages for review instead of writing them live." --- -By default, `llmwiki compile` writes every generated page directly to `wiki/`. You can change that behavior in two ways. The `--review` flag is all-or-nothing: every candidate goes to the review queue and nothing lands in `wiki/` until you approve it. A **review policy** is more surgical — it lets a normal `compile` run write most pages live while automatically holding back the ones that trip specific risk conditions. +By default, `llmwiki compile` writes every generated page directly to `wiki/`. You can change that behavior in two ways. The `--review` flag is all-or-nothing: every candidate goes to the review queue and nothing lands in `wiki/` until you approve it. A **review policy** is more surgical - it lets a normal `compile` run write most pages live while automatically holding back the ones that trip specific risk conditions. With a policy in place, a low-confidence page gets queued for your review while an unambiguous page is written immediately. You get confidence-appropriate friction without slowing down the happy path. @@ -22,23 +22,23 @@ Create `.llmwiki/config.json` in your project root (alongside `sources/` and `wi } ``` -An absent config file, a missing `review` key, or `"hold": []` all mean **off** — compile writes pages directly, matching today's default behavior. +An absent config file, a missing `review` key, or `"hold": []` all mean **off** - compile writes pages directly, matching today's default behavior. ## Hold modes A page is held as a candidate if it trips **any** enabled mode (union semantics). The live `wiki/` page is left untouched when a page is held. -- **`low-confidence`** — holds any page whose `confidence` frontmatter field is below `lowConfidenceThreshold` (default `0.5`). Pages with no `confidence` field are held by default. Set `"treatMissingConfidenceAs": "ok"` in the `review` object to let them pass instead. +- **`low-confidence`** - holds any page whose `confidence` frontmatter field is below `lowConfidenceThreshold` (default `0.5`). Pages with no `confidence` field are held by default. Set `"treatMissingConfidenceAs": "ok"` in the `review` object to let them pass instead. -- **`contradicted`** — holds any page that declares one or more `contradictedBy` entries in its frontmatter, indicating the compiled content conflicts with another page. +- **`contradicted`** - holds any page that declares one or more `contradictedBy` entries in its frontmatter, indicating the compiled content conflicts with another page. -- **`schema-violating`** — holds any page that fails a schema cross-link rule defined in `.llmwiki/schema.json`. For example, a `comparison` page with fewer than the minimum required wikilinks trips this mode. +- **`schema-violating`** - holds any page that fails a schema cross-link rule defined in `.llmwiki/schema.json`. For example, a `comparison` page with fewer than the minimum required wikilinks trips this mode. -- **`provenance-violating`** — holds any page whose citations are broken or malformed — pointing to missing source files, impossible line ranges, or unparseable citation markers. +- **`provenance-violating`** - holds any page whose citations are broken or malformed - pointing to missing source files, impossible line ranges, or unparseable citation markers. -- **`all`** — holds every generated page, regardless of any other condition. Equivalent to running `compile --review` but expressed as a persistent policy. +- **`all`** - holds every generated page, regardless of any other condition. Equivalent to running `compile --review` but expressed as a persistent policy. -- **`off`** or `"hold": []` — disables the policy entirely. All pages are written live. +- **`off`** or `"hold": []` - disables the policy entirely. All pages are written live. ## Fail-closed behavior @@ -55,14 +55,14 @@ When a page is held by the policy: After every compile, llmwiki reports the split between written and held pages: ``` -Wrote 8 page(s), held 2 for review — run `llmwiki review list` +Wrote 8 page(s), held 2 for review - run `llmwiki review list` ``` ## What is and isn't policy-gated -- **`llmwiki compile`** — honors the review policy. -- **`llmwiki refresh --stale`** — honors the same policy when recompiling stale or orphaned pages. -- **`llmwiki query --save`** — is **not** policy-gated in the current version. Saved query answers are written directly to `wiki/queries/`. +- **`llmwiki compile`** - honors the review policy. +- **`llmwiki refresh --stale`** - honors the same policy when recompiling stale or orphaned pages. +- **`llmwiki query --save`** - is **not** policy-gated in the current version. Saved query answers are written directly to `wiki/queries/`. ## Managing the review queue diff --git a/docs/configuration/schema.mdx b/docs/configuration/schema.mdx index deee90a..54b0bfd 100644 --- a/docs/configuration/schema.mdx +++ b/docs/configuration/schema.mdx @@ -4,7 +4,7 @@ sidebarTitle: "Schema" description: "Define .llmwiki/schema.json to enforce page kinds, minimum wikilinks per kind, and seed pages that compile materializes automatically." --- -The schema layer is entirely optional. Without a schema file, llmwiki compiles every page as a `concept` and applies no cross-link minimums — existing wikis continue to work exactly as they did before the schema layer existed. You only need a schema when you want to enforce structure: typed page kinds, minimum wikilink counts per kind, or seed pages that the compiler should materialize automatically. +The schema layer is entirely optional. Without a schema file, llmwiki compiles every page as a `concept` and applies no cross-link minimums - existing wikis continue to work exactly as they did before the schema layer existed. You only need a schema when you want to enforce structure: typed page kinds, minimum wikilink counts per kind, or seed pages that the compiler should materialize automatically. ## Initializing a schema @@ -15,7 +15,7 @@ llmwiki schema init # writes a starter .llmwiki/schema.json llmwiki schema show # prints the resolved schema for the current project ``` -`schema init` writes a template file you can edit. `schema show` always prints the fully-resolved schema, merging your file onto built-in defaults — useful for confirming what the compiler is actually using. +`schema init` writes a template file you can edit. `schema show` always prints the fully-resolved schema, merging your file onto built-in defaults - useful for confirming what the compiler is actually using. ## Schema file location @@ -36,7 +36,7 @@ llmwiki supports four page kinds. The compiler uses the kind as context when gen | Kind | Description | Default `minWikilinks` | |---|---|---| | `concept` | A standalone idea, technique, or pattern worth documenting | `0` | -| `entity` | A specific thing — a person, product, organization, or named artifact | `1` | +| `entity` | A specific thing - a person, product, organization, or named artifact | `1` | | `comparison` | A side-by-side analysis weighing two or more concepts or entities | `2` | | `overview` | A top-down map page that situates several concepts within a domain | `3` | @@ -79,7 +79,7 @@ A schema file is a JSON (or YAML) document with the following shape: } ``` -Every field is optional — you only need to specify what you want to override. Missing fields inherit their built-in defaults, so a minimal schema that only raises `minWikilinks` for `overview` pages is perfectly valid. +Every field is optional - you only need to specify what you want to override. Missing fields inherit their built-in defaults, so a minimal schema that only raises `minWikilinks` for `overview` pages is perfectly valid. ## Seed pages @@ -90,7 +90,7 @@ Every field is optional — you only need to specify what you want to override. | `title` | Yes | Display title; also used to derive the page slug | | `kind` | Yes | One of `concept`, `entity`, `comparison`, `overview` | | `summary` | No | One-line summary written into frontmatter | -| `relatedSlugs` | No | For `overview` and `comparison` kinds — slugs of pages the compiler should weave together as source material | +| `relatedSlugs` | No | For `overview` and `comparison` kinds - slugs of pages the compiler should weave together as source material | ## How the schema affects compile @@ -112,7 +112,7 @@ When you run `llmwiki compile`: When a review policy is active with `schema-violating` in the `hold` array, any page that fails a schema cross-link rule is automatically held for review instead of written live. See [Review Policy](/configuration/review-policy) for how to configure that behavior. - A schema is most useful for **large wikis**, **domain templates**, and **structured knowledge bases** where you want consistent page structure enforced over time. For a personal notebook or exploratory wiki, you likely don't need one — start without a schema and add one when you find yourself wanting to enforce cross-link density or generate overview pages automatically. + A schema is most useful for **large wikis**, **domain templates**, and **structured knowledge bases** where you want consistent page structure enforced over time. For a personal notebook or exploratory wiki, you likely don't need one - start without a schema and add one when you find yourself wanting to enforce cross-link density or generate overview pages automatically. --- diff --git a/docs/guides/atomic-memory-bridge.mdx b/docs/guides/atomic-memory-bridge.mdx index d941154..10cf402 100644 --- a/docs/guides/atomic-memory-bridge.mdx +++ b/docs/guides/atomic-memory-bridge.mdx @@ -4,7 +4,7 @@ sidebarTitle: "Atomic Memory" description: "Use llmwiki export --target json with @atomicmemory/llmwiki to import compiled wiki pages as durable Atomic Memory records with full metadata." --- -llmwiki and [Atomic Memory](https://github.com/atomicstrata/atomicmemory) are two complementary layers of open context infrastructure. llmwiki is a persistent, disk-backed knowledge base — you compile sources into typed, citation-traced markdown pages that accumulate over time and are browsable without any agent involved. Atomic Memory is a runtime memory layer for agents — searchable, correctable, scoped records that agents read and write during their work sessions. +llmwiki and [Atomic Memory](https://github.com/atomicstrata/atomicmemory) are two complementary layers of open context infrastructure. llmwiki is a persistent, disk-backed knowledge base - you compile sources into typed, citation-traced markdown pages that accumulate over time and are browsable without any agent involved. Atomic Memory is a runtime memory layer for agents - searchable, correctable, scoped records that agents read and write during their work sessions. These tools are independently valuable, and you do not need both. But when you are building agent workflows that need structured, cited knowledge as grounded context, the bridge between them closes the gap: `llmwiki export --target json` produces a typed envelope that `@atomicmemory/llmwiki` ingests as one Atomic Memory record per wiki page, with all advisory metadata preserved. @@ -13,10 +13,10 @@ These tools are independently valuable, and you do not need both. But when you a | | llmwiki | Atomic Memory | |--|---------|---------------| | **Primary artifact** | Compiled markdown wiki on disk | Runtime memory records | -| **Browsable by humans** | Yes — local viewer, Obsidian-compatible | Via inspection tools | -| **Citation-traced** | Yes — paragraph and claim-level | Preserved from llmwiki via bridge | -| **Durable across sessions** | Yes — lives on disk, version-controllable | Yes — durable store | -| **Agent-writable at runtime** | Via MCP tools or SDK | Yes — agents update records directly | +| **Browsable by humans** | Yes - local viewer, Obsidian-compatible | Via inspection tools | +| **Citation-traced** | Yes - paragraph and claim-level | Preserved from llmwiki via bridge | +| **Durable across sessions** | Yes - lives on disk, version-controllable | Yes - durable store | +| **Agent-writable at runtime** | Via MCP tools or SDK | Yes - agents update records directly | | **Best for** | Notebook, RAG index, CI knowledge base, domain pack source | Agent working memory, correctable context, scoped per-project | The bridge flows one direction: llmwiki compiles and exports → Atomic Memory ingests. On re-export with the same `--project-id`, the bridge updates existing records rather than creating duplicates. @@ -37,7 +37,7 @@ The bridge flows one direction: llmwiki compiles and exports → Atomic Memory i - Run `llmwiki export` with `--target json` and a stable `--project-id`. The project ID pins a consistent identifier inside the JSON envelope so the bridge can derive deterministic external IDs for each page — re-exporting with the same ID updates existing Atomic Memory records rather than duplicating them. + Run `llmwiki export` with `--target json` and a stable `--project-id`. The project ID pins a consistent identifier inside the JSON envelope so the bridge can derive deterministic external IDs for each page - re-exporting with the same ID updates existing Atomic Memory records rather than duplicating them. ```bash llmwiki export --target json --project-id my-project @@ -63,9 +63,9 @@ The bridge flows one direction: llmwiki compiles and exports → Atomic Memory i When you pass `--project-id my-project`, llmwiki embeds that string in the JSON export envelope alongside each page. The bridge uses the project ID and the page slug together to derive a deterministic external ID for each Atomic Memory record. -This means re-exporting after a recompile and re-importing will **update** the existing records rather than insert duplicates — provided you use the same `--project-id` value each time. Without a project ID, the bridge cannot guarantee stable external IDs across exports. +This means re-exporting after a recompile and re-importing will **update** the existing records rather than insert duplicates - provided you use the same `--project-id` value each time. Without a project ID, the bridge cannot guarantee stable external IDs across exports. -Choose a short, stable, slug-like value — something that describes the project and does not change between runs. For example: `engineering-handbook`, `ml-research-q4`, `onboarding-docs`. +Choose a short, stable, slug-like value - something that describes the project and does not change between runs. For example: `engineering-handbook`, `ml-research-q4`, `onboarding-docs`. ## JSON envelope shape @@ -74,14 +74,14 @@ The JSON export preserves the full per-page advisory metadata that llmwiki track | Field | Description | |-------|-------------| | `path` | Relative path to the wiki page file (e.g. `wiki/concepts/knowledge-compilation.md`) | -| `kind` | Page type — `concept`, `entity`, `comparison`, or `overview` | +| `kind` | Page type - `concept`, `entity`, `comparison`, or `overview` | | `confidence` | LLM-reported confidence in the synthesized page (0–1), if present | -| `provenanceState` | How the page was produced — `extracted`, `merged`, `inferred`, or `ambiguous` | -| `citations` | Flattened list of citations — each with `file`, and optionally `start`/`end` line numbers | +| `provenanceState` | How the page was produced - `extracted`, `merged`, `inferred`, or `ambiguous` | +| `citations` | Flattened list of citations - each with `file`, and optionally `start`/`end` line numbers | | `aliases` | Array of alternative titles declared in the page's frontmatter | | `freshnessStatus` | Whether the page is `fresh`, `stale`, or `orphaned` relative to its sources at export time | -This metadata is advisory — it reflects the state of the wiki at the time of export and is not re-checked by Atomic Memory at query time. +This metadata is advisory - it reflects the state of the wiki at the time of export and is not re-checked by Atomic Memory at query time. ## When to use each tool diff --git a/docs/guides/ci-quality-gates.mdx b/docs/guides/ci-quality-gates.mdx index 73ee9f8..d3cc40b 100644 --- a/docs/guides/ci-quality-gates.mdx +++ b/docs/guides/ci-quality-gates.mdx @@ -4,7 +4,7 @@ sidebarTitle: "CI Quality Gates" description: "Use llmwiki eval with threshold configuration to fail CI when wiki health, citation coverage, or citation quality drops below your defined minimums." --- -A compiled wiki is a living artifact. Every time you ingest new sources and recompile, you introduce changes — new pages, updated content, shifted citations. Without a quality gate, a recompile can silently degrade your wiki: citation precision can drop, health score can fall, and stale provenance can creep in unnoticed. +A compiled wiki is a living artifact. Every time you ingest new sources and recompile, you introduce changes - new pages, updated content, shifted citations. Without a quality gate, a recompile can silently degrade your wiki: citation precision can drop, health score can fall, and stale provenance can creep in unnoticed. `llmwiki eval` gives you a quantitative score for each of these dimensions and exits non-zero when any configured threshold is breached. Wire it into CI and you get the same protection for your knowledge base that lint and tests give you for your code. @@ -20,7 +20,7 @@ The report always prints regardless of exit code, so you can read exactly which ## Threshold configuration -Create `.llmwiki/eval/thresholds.yaml` in your project to configure minimum acceptable scores. All fields are optional — omit any field to skip checking that metric. +Create `.llmwiki/eval/thresholds.yaml` in your project to configure minimum acceptable scores. All fields are optional - omit any field to skip checking that metric. ```yaml health_score: 85 @@ -35,7 +35,7 @@ claim_level_citation_rate: 0.5 ### Field reference - Composite lint health score. Aggregates all lint rules — errors (broken citations, broken wikilinks, duplicate concepts) cost more than warnings. A freshly compiled, well-cited wiki with no broken links typically scores above 90. + Composite lint health score. Aggregates all lint rules - errors (broken citations, broken wikilinks, duplicate concepts) cost more than warnings. A freshly compiled, well-cited wiki with no broken links typically scores above 90. @@ -47,15 +47,15 @@ claim_level_citation_rate: 0.5 - Average LLM-judged support score across a sample of `(claim, source span)` pairs. The judge scores each pair 0 (unsupported), 1 (partially supported), or 2 (fully supported). **This threshold is only checked when you run `--suite full`** — it is silently skipped on the fast suite. + Average LLM-judged support score across a sample of `(claim, source span)` pairs. The judge scores each pair 0 (unsupported), 1 (partially supported), or 2 (fully supported). **This threshold is only checked when you run `--suite full`** - it is silently skipped on the fast suite. - Fraction of valid ingested sources that are actually cited by at least one wiki page. A value below 1.0 means some sources were compiled but none of the resulting pages cite them — which may indicate orphaned content or extraction gaps. + Fraction of valid ingested sources that are actually cited by at least one wiki page. A value below 1.0 means some sources were compiled but none of the resulting pages cite them - which may indicate orphaned content or extraction gaps. - Maximum number of excluded sources permitted. Sources are excluded from compilation when they cannot be safely processed — for example, out-of-tree symlinks that fall outside `sources/`. Setting this to `0` fails the build if any such sources exist. + Maximum number of excluded sources permitted. Sources are excluded from compilation when they cannot be safely processed - for example, out-of-tree symlinks that fall outside `sources/`. Setting this to `0` fails the build if any such sources exist. @@ -94,7 +94,7 @@ jobs: ``` - `--suite fast` checks health score, citation coverage, citation precision, source utilization, source warnings, and claim-level citation rate without any LLM API calls. If you include `citation_support_mean` in your thresholds, switch to `--suite full` — but be aware it will use your LLM provider and incur token costs proportional to the number of citations sampled. + `--suite fast` checks health score, citation coverage, citation precision, source utilization, source warnings, and claim-level citation rate without any LLM API calls. If you include `citation_support_mean` in your thresholds, switch to `--suite full` - but be aware it will use your LLM provider and incur token costs proportional to the number of citations sampled. ## Tracking history @@ -106,11 +106,11 @@ llmwiki eval history # shows all recorded runs llmwiki eval history --n 10 # limit to the last 10 entries ``` -The trend table shows health score, citation coverage, citation precision, and corpus stats (page count, source count, wiki character count) for each run — making it easy to spot regressions after a recompile. +The trend table shows health score, citation coverage, citation precision, and corpus stats (page count, source count, wiki character count) for each run - making it easy to spot regressions after a recompile. ## Regression deltas -Each eval report is automatically diffed against the previous entry in `history.jsonl`. The report prints delta values — for example `health_score: 91 (−4)` — so you can see at a glance whether a recompile improved or degraded your wiki quality. Deltas are also available in the structured `EvalReport.delta` field when using the SDK. +Each eval report is automatically diffed against the previous entry in `history.jsonl`. The report prints delta values - for example `health_score: 91 (−4)` - so you can see at a glance whether a recompile improved or degraded your wiki quality. Deltas are also available in the structured `EvalReport.delta` field when using the SDK. ## Re-printing the latest report @@ -123,7 +123,7 @@ llmwiki eval report This is useful for reviewing results in CI logs or sharing the current state of a wiki without triggering another eval run. - Start with lenient thresholds and tighten them over time as you improve your wiki's citation hygiene. A reasonable starting point for a new wiki might be `health_score: 70` and `citation_coverage_percent: 50` — enough to catch severe regressions without blocking every early commit. As your corpus matures and citations stabilize, raise the thresholds toward your target quality bar. + Start with lenient thresholds and tighten them over time as you improve your wiki's citation hygiene. A reasonable starting point for a new wiki might be `health_score: 70` and `citation_coverage_percent: 50` - enough to catch severe regressions without blocking every early commit. As your corpus matures and citations stabilize, raise the thresholds toward your target quality bar. ## Next steps diff --git a/docs/guides/mcp-agent-integration.mdx b/docs/guides/mcp-agent-integration.mdx index 26d2de8..278cdbc 100644 --- a/docs/guides/mcp-agent-integration.mdx +++ b/docs/guides/mcp-agent-integration.mdx @@ -4,7 +4,7 @@ sidebarTitle: "MCP Integration" description: "Set up the llmwiki MCP server to let Claude Desktop, Cursor, Claude Code, and other MCP agents ingest, compile, and query your wiki directly." --- -The llmwiki MCP server turns any MCP-compatible agent into a first-class wiki collaborator. Instead of scripting CLI calls and parsing stdout, agents can call structured tools — `compile_wiki`, `query_wiki`, `get_context_pack`, and more — and receive typed JSON results they can reason over. Read-only tools work without credentials, so an agent can browse, lint, and inspect your wiki before ever touching the LLM pipeline. +The llmwiki MCP server turns any MCP-compatible agent into a first-class wiki collaborator. Instead of scripting CLI calls and parsing stdout, agents can call structured tools - `compile_wiki`, `query_wiki`, `get_context_pack`, and more - and receive typed JSON results they can reason over. Read-only tools work without credentials, so an agent can browse, lint, and inspect your wiki before ever touching the LLM pipeline. This guide walks you through starting the server, wiring it into Claude Desktop and Cursor, and understanding exactly what your agents can do once connected. @@ -18,7 +18,7 @@ This guide walks you through starting the server, wiring it into Claude Desktop - Run `llmwiki serve` from anywhere, pointing `--root` at your wiki project directory. The server uses stdio transport and requires no credentials at startup — read-only tools work immediately. + Run `llmwiki serve` from anywhere, pointing `--root` at your wiki project directory. The server uses stdio transport and requires no credentials at startup - read-only tools work immediately. ```bash llmwiki serve --root /path/to/wiki-project @@ -28,7 +28,7 @@ This guide walks you through starting the server, wiring it into Claude Desktop - Open your Claude Desktop MCP configuration file (`claude_desktop_config.json`) and add an `llmwiki` entry under `mcpServers`. The `npx` invocation launches the server on demand — you do not need to keep a separate terminal session open. + Open your Claude Desktop MCP configuration file (`claude_desktop_config.json`) and add an `llmwiki` entry under `mcpServers`. The `npx` invocation launches the server on demand - you do not need to keep a separate terminal session open. ```json { @@ -48,7 +48,7 @@ This guide walks you through starting the server, wiring it into Claude Desktop - Cursor uses the same MCP config format. Add the same JSON block to your Cursor MCP settings file. The `env` block is where you supply provider credentials — Cursor passes them through to the server process. + Cursor uses the same MCP config format. Add the same JSON block to your Cursor MCP settings file. The `env` block is where you supply provider credentials - Cursor passes them through to the server process. ```json { @@ -104,9 +104,9 @@ The server exposes nine tools. Read-only tools work without provider credentials | `lint_wiki` | Run quality checks; returns structured diagnostics | No | | `wiki_status` | Page and source counts, stale/orphaned pages, `stateStatus`, `pendingCandidates` | No | | `get_context_pack` | Build a token-budgeted evidence pack (primary pages, chunks, graph neighbors, citations, freshness, warnings, suggested actions) | No | -| `run_eval` | Score wiki quality — fast suite is credential-free; full suite LLM-judges citation samples | Fast: No / Full: Yes | +| `run_eval` | Score wiki quality - fast suite is credential-free; full suite LLM-judges citation samples | Fast: No / Full: Yes | -**`get_context_pack` vs `query_wiki`:** these tools serve different purposes. `get_context_pack` *packages evidence* — it assembles a structured JSON envelope of relevant pages, semantic chunks, and citations that the agent uses for its own reasoning. `query_wiki` *generates an answer* — it calls the LLM and returns prose. Use `get_context_pack` when the agent needs grounded source material to reason over; use `query_wiki` when you want a finished answer. +**`get_context_pack` vs `query_wiki`:** these tools serve different purposes. `get_context_pack` *packages evidence* - it assembles a structured JSON envelope of relevant pages, semantic chunks, and citations that the agent uses for its own reasoning. `query_wiki` *generates an answer* - it calls the LLM and returns prose. Use `get_context_pack` when the agent needs grounded source material to reason over; use `query_wiki` when you want a finished answer. ### MCP resources @@ -115,10 +115,10 @@ Resources are passive read views of the wiki that agents can attach as context w | URI | Returns | |-----|---------| | `llmwiki://index` | Full `wiki/index.md` (auto-generated table of contents) | -| `llmwiki://concept/{slug}` | A single concept page — frontmatter and body | +| `llmwiki://concept/{slug}` | A single concept page - frontmatter and body | | `llmwiki://query/{slug}` | A single saved query page | | `llmwiki://sources` | List of ingested source files with metadata | -| `llmwiki://state` | Compilation state — per-source hashes and last compile times | +| `llmwiki://state` | Compilation state - per-source hashes and last compile times | | `llmwiki://eval/report` | The most recent eval report | | `llmwiki://eval/history` | Trend table of past eval runs | @@ -126,30 +126,30 @@ Resources are passive read views of the wiki that agents can attach as context w The server starts without credentials. Provider credentials are read from the `env` block in your MCP config and forwarded to each pipeline call: -- **Read-only tools** (`read_page`, `lint_wiki`, `wiki_status`, `ingest_source`) — always work, no credentials needed. -- **LLM-dependent tools** (`compile_wiki`, `query_wiki`, `search_pages`) — require a valid provider credential (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, etc.) in the environment at call time. -- **`get_context_pack`** — credential-free by default. When embeddings are available and provider credentials are present, semantic retrieval is used. Without credentials or an embedding store, the tool falls back to lexical ranking and surfaces an `embedding-store-missing` or `query-embedding-unavailable` warning in the response — it does not fail. -- **`run_eval`** — the `fast` suite (health score, citation coverage, corpus stats) runs without any credentials. The `full` suite adds LLM-judged citation scoring and requires a provider. +- **Read-only tools** (`read_page`, `lint_wiki`, `wiki_status`, `ingest_source`) - always work, no credentials needed. +- **LLM-dependent tools** (`compile_wiki`, `query_wiki`, `search_pages`) - require a valid provider credential (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, etc.) in the environment at call time. +- **`get_context_pack`** - credential-free by default. When embeddings are available and provider credentials are present, semantic retrieval is used. Without credentials or an embedding store, the tool falls back to lexical ranking and surfaces an `embedding-store-missing` or `query-embedding-unavailable` warning in the response - it does not fail. +- **`run_eval`** - the `fast` suite (health score, citation coverage, corpus stats) runs without any credentials. The `full` suite adds LLM-judged citation scoring and requires a provider. ## `get_context_pack` details `get_context_pack` accepts several optional parameters for tuning the evidence pack: -- `prompt` *(required)* — the task or question to assemble context for -- `budget` — approximate output token budget (default 8 000) -- `depth` — graph neighborhood traversal depth, `0`–`2` (default `1`; `0` disables graph expansion) -- `topPages` — maximum primary pages to include (default `5`, max `20`) -- `topChunks` — maximum semantic chunks to surface (default `8`, max `50`) -- `includeSources` *(opt-in)* — when `true`, materializes raw source line windows from claim-level citations +- `prompt` *(required)* - the task or question to assemble context for +- `budget` - approximate output token budget (default 8 000) +- `depth` - graph neighborhood traversal depth, `0`–`2` (default `1`; `0` disables graph expansion) +- `topPages` - maximum primary pages to include (default `5`, max `20`) +- `topChunks` - maximum semantic chunks to surface (default `8`, max `50`) +- `includeSources` *(opt-in)* - when `true`, materializes raw source line windows from claim-level citations **`includeSources` is opt-in** because it returns raw content from files under `sources/`. Path confinement prevents reads outside `sources/`, but only enable source windows for agents you trust with the ingested source text. ## Using `ingest_source` safely -`ingest_source` accepts a URL or an absolute local file path. Because it performs a server-side fetch and local file read, it is a **trusted-input-only** primitive — pass only URLs and paths you control. For untrusted or user-supplied content, use the SDK's `ingestText` instead (see the [SDK guide](/guides/sdk)). +`ingest_source` accepts a URL or an absolute local file path. Because it performs a server-side fetch and local file read, it is a **trusted-input-only** primitive - pass only URLs and paths you control. For untrusted or user-supplied content, use the SDK's `ingestText` instead (see the [SDK guide](/guides/sdk)). - After an agent calls `compile_wiki`, check `wiki_status` and look at the `pendingCandidates` field. If your wiki has a review policy configured in `.llmwiki/config.json`, some pages may be held for human review rather than written directly. A non-zero `pendingCandidates` means the agent has queued work that needs your approval before those pages go live — run `llmwiki review list` to inspect them. + After an agent calls `compile_wiki`, check `wiki_status` and look at the `pendingCandidates` field. If your wiki has a review policy configured in `.llmwiki/config.json`, some pages may be held for human review rather than written directly. A non-zero `pendingCandidates` means the agent has queued work that needs your approval before those pages go live - run `llmwiki review list` to inspect them. ## Next steps diff --git a/docs/guides/sdk.mdx b/docs/guides/sdk.mdx index 686f6f5..b6cd23c 100644 --- a/docs/guides/sdk.mdx +++ b/docs/guides/sdk.mdx @@ -1,12 +1,12 @@ --- title: "Using the llmwiki SDK for Programmatic Wiki Control" sidebarTitle: "SDK" -description: "Use createWiki() to drive llmwiki in-process from TypeScript or JavaScript — ingest, compile, query, lint, and export without shelling out." +description: "Use createWiki() to drive llmwiki in-process from TypeScript or JavaScript - ingest, compile, query, lint, and export without shelling out." --- -The llmwiki SDK lets you drive the entire wiki pipeline from inside your own TypeScript or JavaScript application. Instead of spawning a child process and scraping CLI output, you import `createWiki` directly from the package and call methods that return typed results. Every method runs silently — no console output, no global state — and concurrent calls are fully isolated via `AsyncLocalStorage`. +The llmwiki SDK lets you drive the entire wiki pipeline from inside your own TypeScript or JavaScript application. Instead of spawning a child process and scraping CLI output, you import `createWiki` directly from the package and call methods that return typed results. Every method runs silently - no console output, no global state - and concurrent calls are fully isolated via `AsyncLocalStorage`. -Use the SDK when you are embedding llmwiki in your own tooling: a build script, a CI harness, a REST API, a test suite, or any context where you want structured results and clean composability. Use the CLI when you are working interactively from the terminal. Use the MCP server when you want AI agents to drive the pipeline — see [MCP Integration](/guides/mcp-agent-integration). +Use the SDK when you are embedding llmwiki in your own tooling: a build script, a CI harness, a REST API, a test suite, or any context where you want structured results and clean composability. Use the CLI when you are working interactively from the terminal. Use the MCP server when you want AI agents to drive the pipeline - see [MCP Integration](/guides/mcp-agent-integration). ## Installation @@ -24,7 +24,7 @@ import { createWiki } from "llm-wiki-compiler"; ## Creating a wiki instance -`createWiki({ root })` returns a `Wiki` facade bound to a project directory. Pass an absolute or relative path — it is normalized once via `path.resolve` at construction time, so subsequent `cwd` changes in the calling process do not affect it. +`createWiki({ root })` returns a `Wiki` facade bound to a project directory. Pass an absolute or relative path - it is normalized once via `path.resolve` at construction time, so subsequent `cwd` changes in the calling process do not affect it. ```ts import { createWiki } from "llm-wiki-compiler"; @@ -32,7 +32,7 @@ import { createWiki } from "llm-wiki-compiler"; const wiki = createWiki({ root: "./my-wiki" }); ``` -A missing root is valid — `ingest` and `ingestText` create `sources/` via recursive `mkdir` on first write. If the path already exists and is not a directory, `createWiki` throws immediately with a clear error. +A missing root is valid - `ingest` and `ingestText` create `sources/` via recursive `mkdir` on first write. If the path already exists and is not a directory, `createWiki` throws immediately with a clear error. ## Complete example @@ -51,7 +51,7 @@ await wiki.compile(); const { answer } = await wiki.query("What did I note about X?"); console.log(answer); -// Read a read-only status snapshot — no credentials needed +// Read a read-only status snapshot - no credentials needed const status = await wiki.status(); console.log(`${status.pages.total} pages, ${status.sources} sources`); ``` @@ -63,11 +63,11 @@ console.log(`${status.pages.total} pages, ${status.sources} sources`); Fetch a URL or read a local file path into `sources/`. Does not require LLM credentials. - **Trusted input only.** This method performs a server-side fetch and local file read — it is an SSRF and path-traversal primitive. Pass only URLs and paths you control. For untrusted or user-supplied content, use `ingestText` instead. + **Trusted input only.** This method performs a server-side fetch and local file read - it is an SSRF and path-traversal primitive. Pass only URLs and paths you control. For untrusted or user-supplied content, use `ingestText` instead. - Ingest raw text directly as a source document. Does not require LLM credentials, and performs no network fetch or file read — making it the safe path for untrusted content. + Ingest raw text directly as a source document. Does not require LLM credentials, and performs no network fetch or file read - making it the safe path for untrusted content. Note: ingested content is later sent to the LLM during `compile`, so adversarially crafted text is still a prompt-injection vector at that stage. @@ -79,7 +79,7 @@ Both methods return `IngestResult` with `filename`, `chars`, and `truncated` fie Run the incremental compile pipeline. Extracts concepts from new or changed sources, generates typed wiki pages, resolves `[[wikilinks]]`, and rebuilds the index. **Requires LLM credentials.** - - `options.review` — when `true`, writes generated pages to `.llmwiki/candidates/` for review instead of directly to `wiki/`. Same behavior as `llmwiki compile --review`. + - `options.review` - when `true`, writes generated pages to `.llmwiki/candidates/` for review instead of directly to `wiki/`. Same behavior as `llmwiki compile --review`. Source content is sent to the configured LLM provider during compilation. Do not compile wikis containing confidential data unless the provider's data-handling policies are acceptable for that content. @@ -91,8 +91,8 @@ Both methods return `IngestResult` with `filename`, `chars`, and `truncated` fie Generate a grounded answer from the compiled wiki. **Requires LLM credentials.** - - `options.save` — persist the answer as a page in `wiki/queries/` and rebuild the index immediately; future queries use it as context. - - `options.debug` — include retrieval detail (selected chunks, scores) in the result. + - `options.save` - persist the answer as a page in `wiki/queries/` and rebuild the index immediately; future queries use it as context. + - `options.debug` - include retrieval detail (selected chunks, scores) in the result. Returns `QueryResult` with `answer`, `pages`, and (when `debug` is set) `debug` fields. @@ -118,23 +118,23 @@ Both methods return `IngestResult` with `filename`, `chars`, and `truncated` fie - Read a single source record by its basename (e.g. `"note.md"` — the `filename` field from `IngestResult`). Always includes body. Returns `null` if the source does not exist. No credentials required. + Read a single source record by its basename (e.g. `"note.md"` - the `filename` field from `IngestResult`). Always includes body. Returns `null` if the source does not exist. No credentials required. - Remove a source file from `sources/`. Returns `true` if deleted, `false` if not found. The compiled page in `wiki/` is not removed immediately — reconciliation happens on the next `compile()`. No credentials required. + Remove a source file from `sources/`. Returns `true` if deleted, `false` if not found. The compiled page in `wiki/` is not removed immediately - reconciliation happens on the next `compile()`. No credentials required. ### Status and quality - Return a read-only snapshot of the wiki — page counts, source counts, last compile time, stale and orphaned pages, pending changes, and `pendingCandidates`. No credentials required. + Return a read-only snapshot of the wiki - page counts, source counts, last compile time, stale and orphaned pages, pending changes, and `pendingCandidates`. No credentials required. **Performance note:** each call hashes the full source corpus (O(total source bytes)) with no cross-call caching. Do not call `status()` in a hot loop. - Run all lint rules and return a severity-counted summary — broken wikilinks, orphaned pages, duplicate concepts, empty pages, broken citations, stale pages, and more. No credentials required. + Run all lint rules and return a severity-counted summary - broken wikilinks, orphaned pages, duplicate concepts, empty pages, broken citations, stale pages, and more. No credentials required. **Performance note:** same per-call corpus-hashing cost as `status()`. Avoid hot loops. @@ -142,9 +142,9 @@ Both methods return `IngestResult` with `filename`, `chars`, and `truncated` fie Run the wiki quality eval harness. - - `mode: "fast"` — health score, citation coverage, corpus stats. **No credentials required.** - - `mode: "full"` — adds LLM-judged citation support scoring to the fast results. **Requires LLM credentials.** - - `record` — when `true`, appends the result to `.llmwiki/eval/history.jsonl` (default `false`). + - `mode: "fast"` - health score, citation coverage, corpus stats. **No credentials required.** + - `mode: "full"` - adds LLM-judged citation support scoring to the fast results. **Requires LLM credentials.** + - `record` - when `true`, appends the result to `.llmwiki/eval/history.jsonl` (default `false`). Returns `EvalReport` with `health`, `citationCoverage`, `sourceUtilization`, `citationDepth`, `stats`, regression `delta`, and `thresholdViolations`. @@ -152,19 +152,19 @@ Both methods return `IngestResult` with `filename`, `chars`, and `truncated` fie ### Context and export - Build a v1 context pack — the same JSON envelope as `llmwiki context --json` and the MCP `get_context_pack` tool. Returns primary pages, semantic chunks, graph neighbors, citations, per-page freshness, warnings, and suggested actions. + Build a v1 context pack - the same JSON envelope as `llmwiki context --json` and the MCP `get_context_pack` tool. Returns primary pages, semantic chunks, graph neighbors, citations, per-page freshness, warnings, and suggested actions. - - `options.prompt` *(required)* — free-text task or question - - `options.budget` — approximate output token budget - - `options.depth` — graph traversal depth `0`–`2` - - `options.topPages` — max primary pages - - `options.topChunks` — max semantic chunks + - `options.prompt` *(required)* - free-text task or question + - `options.budget` - approximate output token budget + - `options.depth` - graph traversal depth `0`–`2` + - `options.topPages` - max primary pages + - `options.topChunks` - max semantic chunks - Semantic retrieval is opportunistic — when embeddings are not available the pack falls back to lexical ranking with a warning. No credentials required. + Semantic retrieval is opportunistic - when embeddings are not available the pack falls back to lexical ranking with a warning. No credentials required. - Export the compiled wiki as a structured JSON document — the same shape as `llmwiki export --target json`. No credentials required. + Export the compiled wiki as a structured JSON document - the same shape as `llmwiki export --target json`. No credentials required. **Performance note:** same per-call corpus-hashing cost as `status()` and `lint()`. Avoid hot loops. @@ -182,7 +182,7 @@ try { await wiki.compile(); } catch (err) { if (err instanceof ProviderUnavailableError) { - console.error("No LLM provider configured — set ANTHROPIC_API_KEY or equivalent."); + console.error("No LLM provider configured - set ANTHROPIC_API_KEY or equivalent."); } else if (err instanceof UnknownProviderError) { console.error("LLMWIKI_PROVIDER is set to an unrecognized value."); } else { @@ -210,13 +210,13 @@ import type { Several methods hash the full source corpus on every call with no cross-call cache: -- `status()` — O(total source bytes) per call -- `lint()` — O(total source bytes) per call -- `exportJson()` — O(total source bytes) per call +- `status()` - O(total source bytes) per call +- `lint()` - O(total source bytes) per call +- `exportJson()` - O(total source bytes) per call Avoid calling these methods in tight loops or on every request in a server. An mtime-keyed cache is planned for a future release. -`compile()` and `runEval({ mode: "full" })` can run for several minutes on large corpora. There is no progress callback in v1 — the call blocks until the pipeline finishes. +`compile()` and `runEval({ mode: "full" })` can run for several minutes on large corpora. There is no progress callback in v1 - the call blocks until the pipeline finishes. ## Next steps diff --git a/docs/images/okf-docs-banner.svg b/docs/images/okf-docs-banner.svg new file mode 100644 index 0000000..ebc9ccf --- /dev/null +++ b/docs/images/okf-docs-banner.svg @@ -0,0 +1,32 @@ + + Breaking News: llmwiki 0.10.0 supports Open Knowledge Format + llmwiki is now an OKF producer and consumer aligned with Google Cloud's emerging standard for portable knowledge sharing. + + + + + + + + + + + + + BREAKING NEWS + + + + llmwiki 0.10.0 supports Open Knowledge Format + + + A producer and consumer for Google Cloud's emerging standard for portable knowledge sharing + + + + + llmwiki export --target okf + + llmwiki import --okf + + diff --git a/docs/installation.mdx b/docs/installation.mdx index 10126a5..2da41c5 100644 --- a/docs/installation.mdx +++ b/docs/installation.mdx @@ -1,20 +1,20 @@ --- title: "Installing llmwiki: npm Setup and Provider Configuration" sidebarTitle: "Installation" -description: "Install llmwiki globally with npm and configure your LLM provider — Anthropic, OpenAI-compatible, Ollama, GitHub Copilot, or Claude Agent SDK." +description: "Install llmwiki globally with npm and configure your LLM provider - Anthropic, OpenAI-compatible, Ollama, GitHub Copilot, or Claude Agent SDK." --- -llmwiki is a Node.js CLI that you install once globally with npm. After installation, you point it at an LLM provider by setting a handful of environment variables — or in some cases, no variables at all if you're already logged into Claude Code. This page covers the full installation process and every supported provider. +llmwiki is a Node.js CLI that you install once globally with npm. After installation, you point it at an LLM provider by setting a handful of environment variables - or in some cases, no variables at all if you're already logged into Claude Code. This page covers the full installation process and every supported provider. ## Requirements - llmwiki requires **Node.js >= 24**. This is a hard minimum — the package will not run on earlier versions. Check your current version with `node --version` and update via [nodejs.org](https://nodejs.org) or a version manager (`nvm`, `fnm`) if needed. + llmwiki requires **Node.js >= 24**. This is a hard minimum - the package will not run on earlier versions. Check your current version with `node --version` and update via [nodejs.org](https://nodejs.org) or a version manager (`nvm`, `fnm`) if needed. - **Node.js >= 24** (see warning above) - **npm** (bundled with Node.js) -- **Provider credentials** — required for compile and query; ingest is credential-free (see [Provider Setup](#provider-setup) below) +- **Provider credentials** - required for compile and query; ingest is credential-free (see [Provider Setup](#provider-setup) below) ## Install llmwiki @@ -34,11 +34,11 @@ llmwiki --help ## Provider Setup -llmwiki calls an LLM during the compile and query phases. It does **not** require credentials for `ingest`, `view`, `lint`, `status`, or any other read-only operation. Configure your provider by setting the environment variables below — or place them in a `.env` file in your project directory and llmwiki will load them automatically. +llmwiki calls an LLM during the compile and query phases. It does **not** require credentials for `ingest`, `view`, `lint`, `status`, or any other read-only operation. Configure your provider by setting the environment variables below - or place them in a `.env` file in your project directory and llmwiki will load them automatically. - Anthropic is the default provider — no `LLMWIKI_PROVIDER` variable is needed. Set either `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN`; one is sufficient. + Anthropic is the default provider - no `LLMWIKI_PROVIDER` variable is needed. Set either `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN`; one is sufficient. ```bash export ANTHROPIC_API_KEY=sk-ant-... @@ -90,7 +90,7 @@ llmwiki calls an LLM during the compile and query phases. It does **not** requir export OPENAI_BASE_URL=http://localhost:8080/v1 ``` - To use a separate endpoint for embeddings (optional — defaults to the same base URL as chat): + To use a separate endpoint for embeddings (optional - defaults to the same base URL as chat): ```bash export LLMWIKI_PROVIDER=openai @@ -124,7 +124,7 @@ llmwiki calls an LLM during the compile and query phases. It does **not** requir - The `copilot` provider uses the GitHub Copilot API (`https://api.githubcopilot.com`), an OpenAI-compatible endpoint available to Copilot subscribers. It requires a GitHub OAuth token with the `copilot` scope — **classic personal access tokens (PATs) are not supported**. + The `copilot` provider uses the GitHub Copilot API (`https://api.githubcopilot.com`), an OpenAI-compatible endpoint available to Copilot subscribers. It requires a GitHub OAuth token with the `copilot` scope - **classic personal access tokens (PATs) are not supported**. First, refresh your `gh` CLI token to include the required scope: @@ -140,7 +140,7 @@ llmwiki calls an LLM during the compile and query phases. It does **not** requir export LLMWIKI_MODEL=gpt-4o # optional; gpt-4o is the default ``` - Available models (use dots, not dashes in names): `gpt-4o`, `gpt-4o-mini`, `claude-sonnet-4.5`, `claude-sonnet-4.6`, `claude-opus-4.5`, `gemini-2.5-pro`, and others — availability depends on your Copilot plan. + Available models (use dots, not dashes in names): `gpt-4o`, `gpt-4o-mini`, `claude-sonnet-4.5`, `claude-sonnet-4.6`, `claude-opus-4.5`, `gemini-2.5-pro`, and others - availability depends on your Copilot plan. The GitHub Copilot API does not expose an embeddings endpoint. `llmwiki query` will fall back to full-index lexical selection without semantic ranking. For embedding-dependent workflows, switch to the `openai` provider and supply `OPENAI_API_KEY`. diff --git a/docs/introduction.mdx b/docs/introduction.mdx index 1c3b3cc..f8aca81 100644 --- a/docs/introduction.mdx +++ b/docs/introduction.mdx @@ -5,10 +5,10 @@ description: "llmwiki turns URLs, files, and session exports into a persistent, ---

- Breaking News: llmwiki 0.10.0 supports Open Knowledge Format + Breaking News: llmwiki 0.10.0 supports Open Knowledge Format

-llmwiki is a knowledge compiler. You point it at your sources — research papers, documentation sites, notes, session exports — and it uses an LLM pipeline to compile them into a structured, interlinked wiki that you can browse, search, query, and connect to AI agents. Unlike retrieval-augmented generation (RAG), llmwiki compiles your knowledge once into a durable artifact that compounds over time: concepts get their own typed pages, links form a navigable graph, and every claim traces back to its source. +llmwiki is a knowledge compiler. You point it at your sources - research papers, documentation sites, notes, session exports - and it uses an LLM pipeline to compile them into a structured, interlinked wiki that you can browse, search, query, and connect to AI agents. Unlike retrieval-augmented generation (RAG), llmwiki compiles your knowledge once into a durable artifact that compounds over time: concepts get their own typed pages, links form a navigable graph, and every claim traces back to its source. llmwiki also supports [Google Cloud's Open Knowledge Format initiative](https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/). You can export a compiled wiki as an OKF bundle, import OKF bundles from other tools through the review queue, and re-export foreign bundles while preserving producer metadata and llmwiki provenance. @@ -32,17 +32,17 @@ llmwiki also supports [Google Cloud's Open Knowledge Format initiative](https:// ## What llmwiki gives you -**A compiled wiki, not raw chunks.** Instead of storing document chunks and re-discovering relationships at every query, llmwiki runs a two-phase LLM pipeline that extracts concepts, merges duplicates across sources, and generates typed wiki pages — each with citations back to the original source lines. +**A compiled wiki, not raw chunks.** Instead of storing document chunks and re-discovering relationships at every query, llmwiki runs a two-phase LLM pipeline that extracts concepts, merges duplicates across sources, and generates typed wiki pages - each with citations back to the original source lines. -**Semantic search that gets smarter.** After compiling, llmwiki builds chunk-level embeddings. When you run `llmwiki query`, it narrows hundreds of pages to a top-K via cosine similarity, reranks with BM25, and expands along the wikilink graph — giving you a tight, citation-traceable evidence pack. +**Semantic search that gets smarter.** After compiling, llmwiki builds chunk-level embeddings. When you run `llmwiki query`, it narrows hundreds of pages to a top-K via cosine similarity, reranks with BM25, and expands along the wikilink graph - giving you a tight, citation-traceable evidence pack. -**A local web viewer.** `llmwiki view` opens your compiled wiki in a browser — sidebar navigation, full-text search, a force-directed page graph, and provenance chips on every paragraph. +**A local web viewer.** `llmwiki view` opens your compiled wiki in a browser - sidebar navigation, full-text search, a force-directed page graph, and provenance chips on every paragraph. **Agent-ready via MCP.** `llmwiki serve` exposes the full pipeline to Claude Desktop, Cursor, Claude Code, and any MCP-compatible agent. Agents can ingest sources, compile, query, lint, and retrieve context packs without touching the CLI. **Open Knowledge Format round-trip.** `llmwiki export --target okf` writes a Google OKF-style bundle with page docs, references, and an activity log. `llmwiki import --okf` stages external bundles for review by default, so third-party knowledge does not touch your live wiki until you approve it. -**Programmatic control via SDK.** `createWiki({ root })` drives the entire pipeline in-process — no shelling out, no console noise, fully typed. +**Programmatic control via SDK.** `createWiki({ root })` drives the entire pipeline in-process - no shelling out, no console noise, fully typed. ## Who uses llmwiki diff --git a/docs/quickstart.mdx b/docs/quickstart.mdx index 6f480f5..00af30b 100644 --- a/docs/quickstart.mdx +++ b/docs/quickstart.mdx @@ -4,7 +4,7 @@ sidebarTitle: "Quickstart" description: "Install llmwiki, set your API key, and run llmwiki quickstart to compile a URL or local file into a browsable, interlinked wiki in minutes." --- -This quickstart walks you through installing llmwiki, setting your API key, and running `llmwiki quickstart` against a real source — a URL or a local file — to produce a browsable, interlinked wiki in a single command. By the end you'll have a compiled wiki you can query with natural language and explore in a local browser UI. +This quickstart walks you through installing llmwiki, setting your API key, and running `llmwiki quickstart` against a real source - a URL or a local file - to produce a browsable, interlinked wiki in a single command. By the end you'll have a compiled wiki you can query with natural language and explore in a local browser UI. llmwiki requires **Node.js >= 24**. Run `node --version` to check. If you're on an older version, update via [nodejs.org](https://nodejs.org) or a version manager such as `nvm` or `fnm` before continuing. @@ -35,7 +35,7 @@ This quickstart walks you through installing llmwiki, setting your API key, and export ANTHROPIC_API_KEY=sk-ant-... ``` - If your Anthropic-compatible gateway expects a different header, use `ANTHROPIC_AUTH_TOKEN` instead — either variable satisfies Anthropic authentication. You only need one. + If your Anthropic-compatible gateway expects a different header, use `ANTHROPIC_AUTH_TOKEN` instead - either variable satisfies Anthropic authentication. You only need one. ```bash # Alternative: use ANTHROPIC_AUTH_TOKEN @@ -46,7 +46,7 @@ This quickstart walks you through installing llmwiki, setting your API key, and
- Create a new directory for your wiki, then run `llmwiki quickstart` with any supported source — a URL, a local markdown file, a PDF, or even a YouTube link: + Create a new directory for your wiki, then run `llmwiki quickstart` with any supported source - a URL, a local markdown file, a PDF, or even a YouTube link: ```bash mkdir my-wiki && cd my-wiki @@ -99,13 +99,13 @@ This quickstart walks you through installing llmwiki, setting your API key, and llmwiki view --open ``` - The viewer binds to `127.0.0.1` and is private by default. It renders `wiki/` without mutating any files — all writes go through the CLI. + The viewer binds to `127.0.0.1` and is private by default. It renders `wiki/` without mutating any files - all writes go through the CLI.
- Not sure what to do next after setup? Run `llmwiki next` from your project directory. It inspects the current project state and recommends the single most useful follow-up command — whether that's adding more sources, recompiling after a file change, running `llmwiki lint`, or reviewing pending candidates. + Not sure what to do next after setup? Run `llmwiki next` from your project directory. It inspects the current project state and recommends the single most useful follow-up command - whether that's adding more sources, recompiling after a file change, running `llmwiki lint`, or reviewing pending candidates. ## What's next diff --git a/docs/troubleshooting/faq.mdx b/docs/troubleshooting/faq.mdx index f1fe7ea..3ceed89 100644 --- a/docs/troubleshooting/faq.mdx +++ b/docs/troubleshooting/faq.mdx @@ -1,10 +1,10 @@ --- -title: "llmwiki FAQ — Frequently Asked Questions and Answers" +title: "llmwiki FAQ - Frequently Asked Questions and Answers" sidebarTitle: "FAQ" description: "Answers to common llmwiki questions: provider setup, compilation errors, query quality, Obsidian compatibility, and scale considerations." --- -Whether you're hitting your first `ProviderUnavailableError` or fine-tuning a large corpus workflow, this page covers the questions that come up most often. Each answer links out to deeper reference pages where relevant. If your question isn't here, check the [GitHub Issues](https://github.com/atomicstrata/llm-wiki-compiler/issues) tracker — it's the best place to surface new problems and search prior reports. +Whether you're hitting your first `ProviderUnavailableError` or fine-tuning a large corpus workflow, this page covers the questions that come up most often. Each answer links out to deeper reference pages where relevant. If your question isn't here, check the [GitHub Issues](https://github.com/atomicstrata/llm-wiki-compiler/issues) tracker - it's the best place to surface new problems and search prior reports. @@ -48,7 +48,7 @@ Yes. llmwiki supports three keyless or non-Anthropic providers: | Provider | How to activate | Notes | |---|---|---| -| `claude-agent` | `LLMWIKI_PROVIDER=claude-agent` | Uses your local Claude Code login (OAuth/subscription). No `ANTHROPIC_API_KEY` needed — if `claude` runs in your terminal, this works. | +| `claude-agent` | `LLMWIKI_PROVIDER=claude-agent` | Uses your local Claude Code login (OAuth/subscription). No `ANTHROPIC_API_KEY` needed - if `claude` runs in your terminal, this works. | | `ollama` | `LLMWIKI_PROVIDER=ollama` | Runs entirely against a local Ollama instance. Set `LLMWIKI_MODEL` and `OLLAMA_HOST`. | | `copilot` | `LLMWIKI_PROVIDER=copilot` | Uses your GitHub Copilot subscription. Requires an OAuth token with the `copilot` scope (`gh auth refresh --scopes copilot`). Classic PATs are not supported. | @@ -63,7 +63,7 @@ Review the [provider terms of service](https://www.anthropic.com/legal/consumer- -Set `LLMWIKI_PROVIDER` (and any required credentials for the new provider) and re-run compile. llmwiki's incremental hash-based change detection means **only sources whose content has changed since the last compile are sent through the LLM pipeline again** — your existing pages are not lost or regenerated wholesale. +Set `LLMWIKI_PROVIDER` (and any required credentials for the new provider) and re-run compile. llmwiki's incremental hash-based change detection means **only sources whose content has changed since the last compile are sent through the LLM pipeline again** - your existing pages are not lost or regenerated wholesale. ```bash # Switch from Anthropic to Ollama @@ -88,7 +88,7 @@ If the first compile is still slower than expected, consider: - **Starting smaller:** use `llmwiki quickstart ` to ingest and compile a single source first, then add the rest incrementally. - + `llmwiki query` uses a hybrid retrieval stack: semantic chunk embeddings narrow the candidate set, BM25 reranks it, and wikilink-graph expansion adds neighbors. If semantic retrieval isn't working, results fall back to lexical (full-index BM25) selection. Check whether an embedding store exists: @@ -113,7 +113,7 @@ llmwiki eval Yes. The `wiki/` directory produced by llmwiki is Obsidian-compatible by design. - **`[[wikilinks]]`** are generated in slug-based form (`[[slug|Display Title]]`) so Obsidian resolves the file directly, even when the display title differs from the slug. -- **Alias-aware resolution:** a `[[term]]` link resolves to any page that declares `term` in its `aliases` frontmatter, not just the page whose slug matches. This means links survive renames and synonyms — rename a page and declare the old name as an alias, and all existing links continue to work. +- **Alias-aware resolution:** a `[[term]]` link resolves to any page that declares `term` in its `aliases` frontmatter, not just the page whose slug matches. This means links survive renames and synonyms - rename a page and declare the old name as an alias, and all existing links continue to work. - **Tags and Map of Content:** compiled pages carry LLM-extracted tags, and `wiki/MOC.md` groups concept pages by tag for Obsidian's graph view. Simply open `wiki/` (or the project root) as an Obsidian vault. @@ -139,7 +139,7 @@ The `--lang` flag wins over `LLMWIKI_OUTPUT_LANG` when both are set. The flag al -When a source file is deleted, any wiki pages that were produced exclusively from that source become **orphaned** — their source is gone and they can no longer be updated. Pages that had multiple contributing sources are marked **stale** if at least one owner is still present. +When a source file is deleted, any wiki pages that were produced exclusively from that source become **orphaned** - their source is gone and they can no longer be updated. Pages that had multiple contributing sources are marked **stale** if at least one owner is still present. llmwiki detects this automatically by comparing the hashes recorded in `.llmwiki/state.json` against the current state of `sources/` on disk. No recompile is needed to detect it. @@ -178,11 +178,11 @@ export LLMWIKI_PROMPT_BUDGET_CHARS=400000 llmwiki compile ``` -Lexical fallback kicks in automatically when no embedding store is present, so even before embeddings are built, query still works — just without semantic ranking. +Lexical fallback kicks in automatically when no embedding store is present, so even before embeddings are built, query still works - just without semantic ranking. - -By default, `llmwiki view` binds to `127.0.0.1` (loopback only). This is intentional — the viewer renders your local source files and wiki pages, and exposing it on a network interface requires an explicit opt-in. + +By default, `llmwiki view` binds to `127.0.0.1` (loopback only). This is intentional - the viewer renders your local source files and wiki pages, and exposing it on a network interface requires an explicit opt-in. To allow access from another machine on your local network, you must provide **both** flags together: @@ -229,7 +229,7 @@ They serve different downstream use cases: | Command | What it does | |---|---| | `llmwiki query "question"` | Runs the full retrieval pipeline and then **generates a grounded natural-language answer** using an LLM. Optionally saves the answer as a wiki page with `--save`. Requires provider credentials. | -| `llmwiki context "prompt"` | Builds and returns an **evidence pack** — primary pages, semantic chunk hits, graph neighbors, citations, warnings, and suggested actions — but **does not generate an answer**. The pack is ready for you or an agent to reason over directly. Falls back to lexical retrieval when no embeddings exist. | +| `llmwiki context "prompt"` | Builds and returns an **evidence pack** - primary pages, semantic chunk hits, graph neighbors, citations, warnings, and suggested actions - but **does not generate an answer**. The pack is ready for you or an agent to reason over directly. Falls back to lexical retrieval when no embeddings exist. | Use `llmwiki query` when you want a direct answer. Use `llmwiki context` (or MCP `get_context_pack`) when you want to supply structured evidence to an agent or build your own reasoning layer on top. @@ -246,7 +246,7 @@ llmwiki watch Watch mode uses the same hash-based change detection as `llmwiki compile`, so only the files that actually changed are sent through the LLM pipeline. It's useful during active research or writing sessions where you're frequently updating or adding source files. -Running `llmwiki watch` continuously prevents stale pages from accumulating — each source edit is recompiled immediately rather than discovered later by `llmwiki lint`. See [Detecting and Repairing Stale Pages](/troubleshooting/stale-pages) for more on the stale-page lifecycle. +Running `llmwiki watch` continuously prevents stale pages from accumulating - each source edit is recompiled immediately rather than discovered later by `llmwiki lint`. See [Detecting and Repairing Stale Pages](/troubleshooting/stale-pages) for more on the stale-page lifecycle. diff --git a/docs/troubleshooting/stale-pages.mdx b/docs/troubleshooting/stale-pages.mdx index 6aa3699..fb95539 100644 --- a/docs/troubleshooting/stale-pages.mdx +++ b/docs/troubleshooting/stale-pages.mdx @@ -4,7 +4,7 @@ sidebarTitle: "Stale Pages" description: "llmwiki tracks source freshness and flags stale and orphaned pages. Learn how to detect them with lint and repair them with refresh --stale." --- -Every wiki page llmwiki generates records the source files — and their content hashes — that produced it. At compile time, those hashes are written to `.llmwiki/state.json`. On any subsequent command, llmwiki compares the recorded hashes against the current state of `sources/` on disk. This comparison is **source freshness tracking**: a lightweight, on-demand signal that tells you whether each page still accurately reflects the sources it was built from, without requiring a full recompile to find out. +Every wiki page llmwiki generates records the source files - and their content hashes - that produced it. At compile time, those hashes are written to `.llmwiki/state.json`. On any subsequent command, llmwiki compares the recorded hashes against the current state of `sources/` on disk. This comparison is **source freshness tracking**: a lightweight, on-demand signal that tells you whether each page still accurately reflects the sources it was built from, without requiring a full recompile to find out. When sources drift out of sync with the compiled wiki, llmwiki surfaces the issue clearly so you can repair it deliberately rather than discovering it through stale query results. @@ -14,19 +14,19 @@ When sources drift out of sync with the compiled wiki, llmwiki surfaces the issu A page is **stale** when at least one of its owning sources still exists on disk but its content has changed since the last compile. The recorded hash in `.llmwiki/state.json` no longer matches the file's current content hash. The page exists and is readable, but it may no longer accurately represent the current state of the source material. -A page with multiple contributing sources is also marked stale when a subset of those sources was deleted — the surviving sources are still present, but the page can no longer be fully regenerated without them. +A page with multiple contributing sources is also marked stale when a subset of those sources was deleted - the surviving sources are still present, but the page can no longer be fully regenerated without them. ### Orphaned A page is **orphaned** when every source that produced it has been deleted from `sources/`. The page exists on disk but has no living source to regenerate it from. Orphaned pages are flagged by `llmwiki lint` and are candidates for cleanup. -Query pages (saved answers written by `llmwiki query --save`) are never marked stale or orphaned — they are generated answers, not source projections. Their freshness status is always `unverified`. +Query pages (saved answers written by `llmwiki query --save`) are never marked stale or orphaned - they are generated answers, not source projections. Their freshness status is always `unverified`. ## How llmwiki detects freshness -Freshness is derived on demand. There is no background watcher or daemon involved. When you run a command that checks freshness — `llmwiki lint`, `llmwiki status`, or `llmwiki refresh --stale` — llmwiki: +Freshness is derived on demand. There is no background watcher or daemon involved. When you run a command that checks freshness - `llmwiki lint`, `llmwiki status`, or `llmwiki refresh --stale` - llmwiki: 1. Reads `.llmwiki/state.json`, which records the per-source content hashes from the last compile along with each source's ownership map (which concept slugs it produced). 2. For each source in the state file, checks whether the file still exists on disk and hashes its current content if so. @@ -37,7 +37,7 @@ Freshness is derived on demand. There is no background watcher or daemon involve - If any owning source was deleted, or any live owning source has a changed hash → `stale` - If all live owning sources match their recorded hashes → `fresh` -This hashing pass happens once per command and is shared by every consumer — lint, export, MCP tools, the viewer — so freshness is never computed redundantly in a single run. +This hashing pass happens once per command and is shared by every consumer - lint, export, MCP tools, the viewer - so freshness is never computed redundantly in a single run. ## Where staleness is surfaced @@ -47,7 +47,7 @@ You don't need to run a dedicated freshness check to know about stale pages. The **`llmwiki status`** (and the MCP `wiki_status` tool) returns a structured snapshot that includes stale and orphaned page counts, the full list of affected slugs, and a `stateStatus` field that reports whether `.llmwiki/state.json` is `ok`, `missing`, or `corrupt`. -**Local viewer** (`llmwiki view`): each page card displays a badge for its freshness status — **STALE**, **ORPHANED**, **CONTRADICTED**, or **ARCHIVED**. The sidebar health pane shows aggregate counts. You can filter the page list by freshness axis. If `.llmwiki/state.json` is missing or corrupt, a corrupt-state banner appears at the top of the viewer. +**Local viewer** (`llmwiki view`): each page card displays a badge for its freshness status - **STALE**, **ORPHANED**, **CONTRADICTED**, or **ARCHIVED**. The sidebar health pane shows aggregate counts. You can filter the page list by freshness axis. If `.llmwiki/state.json` is missing or corrupt, a corrupt-state banner appears at the top of the viewer. **MCP `get_context_pack`**: the evidence pack includes a `freshnessStatus` field per page, so agents consuming the pack know which pages may be outdated before reasoning over them. @@ -61,9 +61,9 @@ You don't need to run a dedicated freshness check to know about stale pages. The `llmwiki refresh --stale` is the targeted repair command. It does three things: -1. **Recompiles the changed owning sources** — only the sources that changed and own stale pages are sent back through the two-phase LLM pipeline. Sources that are new and have never been compiled are deliberately skipped. -2. **Cleans up orphaned pages** — pages whose every source was deleted are removed from `wiki/`. This cleanup requires no LLM calls and no API key. -3. **Leaves unrelated pages untouched** — pages whose sources haven't changed are not reprocessed, even if other pages in the project are stale. +1. **Recompiles the changed owning sources** - only the sources that changed and own stale pages are sent back through the two-phase LLM pipeline. Sources that are new and have never been compiled are deliberately skipped. +2. **Cleans up orphaned pages** - pages whose every source was deleted are removed from `wiki/`. This cleanup requires no LLM calls and no API key. +3. **Leaves unrelated pages untouched** - pages whose sources haven't changed are not reprocessed, even if other pages in the project are stale. `llmwiki refresh --stale` does **not** pick up new source files that have never been compiled. If you've added new files to `sources/` since the last compile, run `llmwiki compile` to ingest them. Use `refresh --stale` specifically for repairing pages whose sources changed or were deleted. @@ -77,15 +77,15 @@ Before committing to a repair, preview exactly what `refresh --stale` would do: llmwiki refresh --stale --dry-run ``` -`--dry-run` prints the repair plan — which sources will be recompiled, which orphaned pages will be cleaned up — without making any LLM calls or writing anything to disk. Use this to verify the scope of the repair before running it for real. +`--dry-run` prints the repair plan - which sources will be recompiled, which orphaned pages will be cleaned up - without making any LLM calls or writing anything to disk. Use this to verify the scope of the repair before running it for real. ### Cleanup-only refreshes -When all stale pages are orphaned (every affected source was deleted, no sources merely changed), the repair requires no LLM calls — it's a filesystem cleanup only. You can run `llmwiki refresh --stale` in this case **without any provider credentials configured**. +When all stale pages are orphaned (every affected source was deleted, no sources merely changed), the repair requires no LLM calls - it's a filesystem cleanup only. You can run `llmwiki refresh --stale` in this case **without any provider credentials configured**. ## Review policy and refresh -If your project has a review policy declared in `.llmwiki/config.json`, `llmwiki refresh --stale` honors it the same way `llmwiki compile` does. Pages that trip a hold condition (low confidence, contradicted, schema-violating, or provenance-violating) are queued as candidates in `.llmwiki/candidates/` rather than written directly to `wiki/`. Configuration is fail-closed — a malformed config aborts the refresh rather than silently disabling the policy. +If your project has a review policy declared in `.llmwiki/config.json`, `llmwiki refresh --stale` honors it the same way `llmwiki compile` does. Pages that trip a hold condition (low confidence, contradicted, schema-violating, or provenance-violating) are queued as candidates in `.llmwiki/candidates/` rather than written directly to `wiki/`. Configuration is fail-closed - a malformed config aborts the refresh rather than silently disabling the policy. ```json { @@ -158,7 +158,7 @@ This regenerates `.llmwiki/state.json` from the current `sources/` directory and -The best way to avoid accumulating stale pages is to keep `llmwiki watch` running during active editing sessions. It monitors `sources/` for changes and triggers an incremental recompile automatically whenever a file is saved — so pages stay fresh as you work rather than drifting until you notice. +The best way to avoid accumulating stale pages is to keep `llmwiki watch` running during active editing sessions. It monitors `sources/` for changes and triggers an incremental recompile automatically whenever a file is saved - so pages stay fresh as you work rather than drifting until you notice. ```bash llmwiki watch @@ -167,5 +167,5 @@ llmwiki watch ## Related reference pages -- [CLI reference: lint and eval](/cli/lint-eval) — full documentation for `llmwiki lint` rules and `llmwiki eval` thresholds -- [CLI reference: compile and refresh](/cli/compile) — incremental compilation and the `--review` flag +- [CLI reference: lint and eval](/cli/lint-eval) - full documentation for `llmwiki lint` rules and `llmwiki eval` thresholds +- [CLI reference: compile and refresh](/cli/compile) - incremental compilation and the `--review` flag