Rule-candidate extraction & export pipeline (W1–W5) by ethanj · Pull Request #77 · atomicstrata/llm-wiki-compiler

ethanj · 2026-05-31T23:22:10Z

Rule-candidate extraction & export pipeline (W1–W5)

Implements the producer side of the AtomicMemory learning loop: the compiler extracts machine-actionable rule candidates from sources and exports them for a downstream rule importer to consume.

What's in here

W1 documented stable sources/*.md input contract (SOURCES_CONTRACT.md).
W2 rule-candidate extraction → review/approve → export (rules extract|list|approve|reject|export). Extraction is gated by its own .llmwiki/rule-state.json cursor so approvals are never overwritten on re-run, and a producer-side validator enforces the downstream rule-import contract (id/category alphabet, field caps, evidence safety) so nothing un-importable is emitted.
W3 schemaVersion on the export envelope.
W4 per-page compile-time provenance — modelId, promptVersion, contentHash, sourceHashes stamped when a page is compiled rather than resolved at export time, so lineage reflects the model that actually produced each page.
W5 programmatic incremental compileDelta, keyed on (pageDirectory, slug).

Verification

npm test, npm run build, npx tsc --noEmit, and npx fallow all green.

Add EXPORT_SCHEMA_VERSION = 1 const and schemaVersion: number to JsonExportDocument so downstream consumers (Radar) can pin a contract. buildJsonExport always emits schemaVersion at position 1 in the envelope. Test asserts schemaVersion === EXPORT_SCHEMA_VERSION === 1.

Document the sources/*.md + YAML-frontmatter format as a stable ingest contract a programmatic producer (e.g. Radar) can write to drive compilation: required/optional frontmatter fields, filename/slug rules, MAX_SOURCE_CHARS, and the hash-gated change-detection behavior. Notes a git-log adapter as a future producer; no new connector ships with W1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…content/source hashes (radar W4) Stamp the JSON export with auditable compilation provenance so a downstream consumer can tie a page back to the model, prompt version, and source bytes that produced it: - Envelope: add modelId (resolved from the active LLM client config via resolveActiveModelId, overridable for tests) and promptVersion (PROMPT_VERSION const near the extraction tool). schemaVersion kept. - Per ExportPage: add contentHash (deterministic sha256 of the body) and sourceHashes (the per-source sha256 digests already recorded in .llmwiki/state.json for change detection, surfaced via a new src/export/provenance.ts helper, mapped from the page sources list). All fields are additive and forward-compatible. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…r W5) Add compileDelta(root, options): a library entry point that runs the normal hash-gated compile and returns only the ExportPages changed in that run, instead of the full corpus. Driven entirely by the existing detectChanges / .llmwiki/state.json SHA-256 source-change detection and the slugs compileAndReport already reports — so a second call with an up-to-date state returns an empty delta, and adding/editing a source yields only that page. The full compile behavior is unchanged. Returned pages carry the full provenance-stamped ExportPage shape so a consumer (Radar) can ship deltas through the same contract as a full export. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… export (radar W2) Add the learning-loop "recommend rules" producer: a `rules` mode that LLM-extracts machine-actionable RuleCandidate records from changed sources, supports a review/approve/reject lifecycle parallel to the concept review queue, and exports the candidates as a JSON array matching Atomic Radar's import contract exactly (camelCase keys, tagged evidence, lowercase status/confidence). - rule-types.ts: RuleCandidate / ProposedRule / EvidenceRef / provenance shapes. - rule-prompts.ts: RULE_EXTRACTION_TOOL + prompt + parser (RULE_PROMPT_VERSION v1). - rule-extractor.ts: changed-source-gated extraction; stamps provenance.modelId via W4 resolveActiveModelId and modelVersion via the rule-prompt version. - rule-candidates.ts: persistence + status flip + archive under .llmwiki/rule-candidates/. - rule-candidates-json.ts: scoped JSON-array export for Radar. - commands/rules.ts + CLI: rules extract|list|approve|reject|export. - candidate-store.ts: shared list/archive primitives (dedupes concept + rule queues). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Addresses the review findings on the W2/W4/W5 radar surfaces — the producer was emitting candidates Radar silently rejects, claiming export-time model ids as compile lineage, and clobbering human approvals on re-extraction. Rule extraction (W2): - Track a separate `.llmwiki/rule-state.json` cursor so `rules extract` no longer borrows the concept compiler's change gate. An already-compiled source now yields candidates, and an unchanged source is not reprocessed every run. - Never overwrite an approved/rejected candidate on re-extraction; preserved decisions are reported as notes. - Candidate ids carry a content-hash slug suffix and a sanitized `[a-z0-9_]` category segment, so distinct rules never collide on disk and every id passes Radar's import regex (multi-word categories no longer produce hyphenated segments Radar refuses). - Add a producer-side validator mirroring Radar's import gate (id/category alphabet, field caps, https-only urls, evidence path safety); failing candidates are dropped with a note instead of exported. - Number the source before prompting and clip it to LLMWIKI_PROMPT_BUDGET_CHARS, so evidence line spans reference anchors the model actually saw and a large source can't blow the prompt window. Inverted and out-of-bounds spans are dropped. Export provenance (W4): - Stamp modelId/promptVersion into page frontmatter at compile time and surface them per page in the JSON export, replacing the envelope-level export-time env read that could attribute a page to a model that never produced it. Compile delta (W5): - Match changed pages on (pageDirectory, slug), not bare slug, so a saved query sharing a slug with a changed concept is never mis-included. Also: dedupe the candidate file-id/extension helpers, simplify the list command's scope filter, and add subprocess integration coverage for the rules CLI (list, missing-id approve/reject, invalid scope, export output, extract credential failure).

# Conflicts: # src/export/collect.ts # src/export/types.ts

ethanj and others added 6 commits May 31, 2026 00:28

Merge branch 'main' into feat/radar-powerup-p0

94bce87

ethanj marked this pull request as draft June 2, 2026 02:18

ethanj added 2 commits June 4, 2026 02:22

docs: genericize internal project references in comments and docs

23f703b

ethanj changed the title ~~Radar powerup: llm-wiki-compiler (W1–W5)~~ Rule-candidate extraction & export pipeline (W1–W5) Jun 5, 2026

ethanj marked this pull request as ready for review June 7, 2026 00:08

ethanj added 2 commits June 6, 2026 17:16

Merge remote-tracking branch 'origin/main' into feat/radar-powerup-p0

cd6325c

# Conflicts: # src/export/collect.ts # src/export/types.ts

fix(pr77): harden rule export after main merge

b5d7c76

ethanj merged commit e9bdf48 into main Jun 7, 2026
2 checks passed

ethanj deleted the feat/radar-powerup-p0 branch June 7, 2026 00:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rule-candidate extraction & export pipeline (W1–W5)#77

Rule-candidate extraction & export pipeline (W1–W5)#77
ethanj merged 10 commits into
mainfrom
feat/radar-powerup-p0

ethanj commented May 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ethanj commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!