Skip to content

Rule-candidate extraction & export pipeline (W1–W5)#77

Merged
ethanj merged 10 commits into
mainfrom
feat/radar-powerup-p0
Jun 7, 2026
Merged

Rule-candidate extraction & export pipeline (W1–W5)#77
ethanj merged 10 commits into
mainfrom
feat/radar-powerup-p0

Conversation

@ethanj

@ethanj ethanj commented May 31, 2026

Copy link
Copy Markdown
Contributor

Rule-candidate extraction & export pipeline (W1–W5)

Implements the producer side of the AtomicMemory learning loop: the compiler extracts machine-actionable rule candidates from sources and exports them for a downstream rule importer to consume.

What's in here

  • W1 documented stable sources/*.md input contract (SOURCES_CONTRACT.md).
  • W2 rule-candidate extraction → review/approve → export (rules extract|list|approve|reject|export). Extraction is gated by its own .llmwiki/rule-state.json cursor so approvals are never overwritten on re-run, and a producer-side validator enforces the downstream rule-import contract (id/category alphabet, field caps, evidence safety) so nothing un-importable is emitted.
  • W3 schemaVersion on the export envelope.
  • W4 per-page compile-time provenance — modelId, promptVersion, contentHash, sourceHashes stamped when a page is compiled rather than resolved at export time, so lineage reflects the model that actually produced each page.
  • W5 programmatic incremental compileDelta, keyed on (pageDirectory, slug).

Verification

npm test, npm run build, npx tsc --noEmit, and npx fallow all green.

ethanj and others added 6 commits May 31, 2026 00:28
Add EXPORT_SCHEMA_VERSION = 1 const and schemaVersion: number to
JsonExportDocument so downstream consumers (Radar) can pin a contract.
buildJsonExport always emits schemaVersion at position 1 in the envelope.
Test asserts schemaVersion === EXPORT_SCHEMA_VERSION === 1.
Document the sources/*.md + YAML-frontmatter format as a stable ingest
contract a programmatic producer (e.g. Radar) can write to drive
compilation: required/optional frontmatter fields, filename/slug rules,
MAX_SOURCE_CHARS, and the hash-gated change-detection behavior. Notes a
git-log adapter as a future producer; no new connector ships with W1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…content/source hashes (radar W4)

Stamp the JSON export with auditable compilation provenance so a
downstream consumer can tie a page back to the model, prompt version,
and source bytes that produced it:

- Envelope: add modelId (resolved from the active LLM client config via
  resolveActiveModelId, overridable for tests) and promptVersion
  (PROMPT_VERSION const near the extraction tool). schemaVersion kept.
- Per ExportPage: add contentHash (deterministic sha256 of the body) and
  sourceHashes (the per-source sha256 digests already recorded in
  .llmwiki/state.json for change detection, surfaced via a new
  src/export/provenance.ts helper, mapped from the page sources list).

All fields are additive and forward-compatible.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…r W5)

Add compileDelta(root, options): a library entry point that runs the
normal hash-gated compile and returns only the ExportPages changed in
that run, instead of the full corpus. Driven entirely by the existing
detectChanges / .llmwiki/state.json SHA-256 source-change detection and
the slugs compileAndReport already reports — so a second call with an
up-to-date state returns an empty delta, and adding/editing a source
yields only that page. The full compile behavior is unchanged.

Returned pages carry the full provenance-stamped ExportPage shape so a
consumer (Radar) can ship deltas through the same contract as a full
export.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… export (radar W2)

Add the learning-loop "recommend rules" producer: a `rules` mode that LLM-extracts
machine-actionable RuleCandidate records from changed sources, supports a
review/approve/reject lifecycle parallel to the concept review queue, and exports
the candidates as a JSON array matching Atomic Radar's import contract exactly
(camelCase keys, tagged evidence, lowercase status/confidence).

- rule-types.ts: RuleCandidate / ProposedRule / EvidenceRef / provenance shapes.
- rule-prompts.ts: RULE_EXTRACTION_TOOL + prompt + parser (RULE_PROMPT_VERSION v1).
- rule-extractor.ts: changed-source-gated extraction; stamps provenance.modelId via
  W4 resolveActiveModelId and modelVersion via the rule-prompt version.
- rule-candidates.ts: persistence + status flip + archive under .llmwiki/rule-candidates/.
- rule-candidates-json.ts: scoped JSON-array export for Radar.
- commands/rules.ts + CLI: rules extract|list|approve|reject|export.
- candidate-store.ts: shared list/archive primitives (dedupes concept + rule queues).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ethanj ethanj marked this pull request as draft June 2, 2026 02:18
ethanj added 2 commits June 4, 2026 02:22
Addresses the review findings on the W2/W4/W5 radar surfaces — the producer
was emitting candidates Radar silently rejects, claiming export-time model
ids as compile lineage, and clobbering human approvals on re-extraction.

Rule extraction (W2):
- Track a separate `.llmwiki/rule-state.json` cursor so `rules extract` no
  longer borrows the concept compiler's change gate. An already-compiled
  source now yields candidates, and an unchanged source is not reprocessed
  every run.
- Never overwrite an approved/rejected candidate on re-extraction; preserved
  decisions are reported as notes.
- Candidate ids carry a content-hash slug suffix and a sanitized `[a-z0-9_]`
  category segment, so distinct rules never collide on disk and every id
  passes Radar's import regex (multi-word categories no longer produce
  hyphenated segments Radar refuses).
- Add a producer-side validator mirroring Radar's import gate (id/category
  alphabet, field caps, https-only urls, evidence path safety); failing
  candidates are dropped with a note instead of exported.
- Number the source before prompting and clip it to LLMWIKI_PROMPT_BUDGET_CHARS,
  so evidence line spans reference anchors the model actually saw and a large
  source can't blow the prompt window. Inverted and out-of-bounds spans are
  dropped.

Export provenance (W4):
- Stamp modelId/promptVersion into page frontmatter at compile time and
  surface them per page in the JSON export, replacing the envelope-level
  export-time env read that could attribute a page to a model that never
  produced it.

Compile delta (W5):
- Match changed pages on (pageDirectory, slug), not bare slug, so a saved
  query sharing a slug with a changed concept is never mis-included.

Also: dedupe the candidate file-id/extension helpers, simplify the list
command's scope filter, and add subprocess integration coverage for the
rules CLI (list, missing-id approve/reject, invalid scope, export output,
extract credential failure).
@ethanj ethanj changed the title Radar powerup: llm-wiki-compiler (W1–W5) Rule-candidate extraction & export pipeline (W1–W5) Jun 5, 2026
@ethanj ethanj marked this pull request as ready for review June 7, 2026 00:08
@ethanj ethanj merged commit e9bdf48 into main Jun 7, 2026
2 checks passed
@ethanj ethanj deleted the feat/radar-powerup-p0 branch June 7, 2026 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant