Rule-candidate extraction & export pipeline (W1–W5)#77
Merged
Conversation
Add EXPORT_SCHEMA_VERSION = 1 const and schemaVersion: number to JsonExportDocument so downstream consumers (Radar) can pin a contract. buildJsonExport always emits schemaVersion at position 1 in the envelope. Test asserts schemaVersion === EXPORT_SCHEMA_VERSION === 1.
Document the sources/*.md + YAML-frontmatter format as a stable ingest contract a programmatic producer (e.g. Radar) can write to drive compilation: required/optional frontmatter fields, filename/slug rules, MAX_SOURCE_CHARS, and the hash-gated change-detection behavior. Notes a git-log adapter as a future producer; no new connector ships with W1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…content/source hashes (radar W4) Stamp the JSON export with auditable compilation provenance so a downstream consumer can tie a page back to the model, prompt version, and source bytes that produced it: - Envelope: add modelId (resolved from the active LLM client config via resolveActiveModelId, overridable for tests) and promptVersion (PROMPT_VERSION const near the extraction tool). schemaVersion kept. - Per ExportPage: add contentHash (deterministic sha256 of the body) and sourceHashes (the per-source sha256 digests already recorded in .llmwiki/state.json for change detection, surfaced via a new src/export/provenance.ts helper, mapped from the page sources list). All fields are additive and forward-compatible. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…r W5) Add compileDelta(root, options): a library entry point that runs the normal hash-gated compile and returns only the ExportPages changed in that run, instead of the full corpus. Driven entirely by the existing detectChanges / .llmwiki/state.json SHA-256 source-change detection and the slugs compileAndReport already reports — so a second call with an up-to-date state returns an empty delta, and adding/editing a source yields only that page. The full compile behavior is unchanged. Returned pages carry the full provenance-stamped ExportPage shape so a consumer (Radar) can ship deltas through the same contract as a full export. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… export (radar W2) Add the learning-loop "recommend rules" producer: a `rules` mode that LLM-extracts machine-actionable RuleCandidate records from changed sources, supports a review/approve/reject lifecycle parallel to the concept review queue, and exports the candidates as a JSON array matching Atomic Radar's import contract exactly (camelCase keys, tagged evidence, lowercase status/confidence). - rule-types.ts: RuleCandidate / ProposedRule / EvidenceRef / provenance shapes. - rule-prompts.ts: RULE_EXTRACTION_TOOL + prompt + parser (RULE_PROMPT_VERSION v1). - rule-extractor.ts: changed-source-gated extraction; stamps provenance.modelId via W4 resolveActiveModelId and modelVersion via the rule-prompt version. - rule-candidates.ts: persistence + status flip + archive under .llmwiki/rule-candidates/. - rule-candidates-json.ts: scoped JSON-array export for Radar. - commands/rules.ts + CLI: rules extract|list|approve|reject|export. - candidate-store.ts: shared list/archive primitives (dedupes concept + rule queues). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses the review findings on the W2/W4/W5 radar surfaces — the producer was emitting candidates Radar silently rejects, claiming export-time model ids as compile lineage, and clobbering human approvals on re-extraction. Rule extraction (W2): - Track a separate `.llmwiki/rule-state.json` cursor so `rules extract` no longer borrows the concept compiler's change gate. An already-compiled source now yields candidates, and an unchanged source is not reprocessed every run. - Never overwrite an approved/rejected candidate on re-extraction; preserved decisions are reported as notes. - Candidate ids carry a content-hash slug suffix and a sanitized `[a-z0-9_]` category segment, so distinct rules never collide on disk and every id passes Radar's import regex (multi-word categories no longer produce hyphenated segments Radar refuses). - Add a producer-side validator mirroring Radar's import gate (id/category alphabet, field caps, https-only urls, evidence path safety); failing candidates are dropped with a note instead of exported. - Number the source before prompting and clip it to LLMWIKI_PROMPT_BUDGET_CHARS, so evidence line spans reference anchors the model actually saw and a large source can't blow the prompt window. Inverted and out-of-bounds spans are dropped. Export provenance (W4): - Stamp modelId/promptVersion into page frontmatter at compile time and surface them per page in the JSON export, replacing the envelope-level export-time env read that could attribute a page to a model that never produced it. Compile delta (W5): - Match changed pages on (pageDirectory, slug), not bare slug, so a saved query sharing a slug with a changed concept is never mis-included. Also: dedupe the candidate file-id/extension helpers, simplify the list command's scope filter, and add subprocess integration coverage for the rules CLI (list, missing-id approve/reject, invalid scope, export output, extract credential failure).
# Conflicts: # src/export/collect.ts # src/export/types.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rule-candidate extraction & export pipeline (W1–W5)
Implements the producer side of the AtomicMemory learning loop: the compiler extracts machine-actionable rule candidates from sources and exports them for a downstream rule importer to consume.
What's in here
sources/*.mdinput contract (SOURCES_CONTRACT.md).rules extract|list|approve|reject|export). Extraction is gated by its own.llmwiki/rule-state.jsoncursor so approvals are never overwritten on re-run, and a producer-side validator enforces the downstream rule-import contract (id/category alphabet, field caps, evidence safety) so nothing un-importable is emitted.schemaVersionon the export envelope.modelId,promptVersion,contentHash,sourceHashesstamped when a page is compiled rather than resolved at export time, so lineage reflects the model that actually produced each page.compileDelta, keyed on(pageDirectory, slug).Verification
npm test,npm run build,npx tsc --noEmit, andnpx fallowall green.