Skip to content

feat: LLM-grounded semantic layer (kb describe) + semantic grounding gate#14

Merged
v0ropaev merged 1 commit into
masterfrom
feat/llm-describe
Jun 21, 2026
Merged

feat: LLM-grounded semantic layer (kb describe) + semantic grounding gate#14
v0ropaev merged 1 commit into
masterfrom
feat/llm-describe

Conversation

@v0ropaev

Copy link
Copy Markdown
Owner

First slice of the LLM-grounded semantic layer (next Roadmap item). A separate, key-gated kb describe pass has an LLM write a short NL summary + structured claims for each api_route/entity in a snapshot; every claim is validated against the artifact's own grounding spans by a deterministic sub-property gate, unvalidated claims are dropped, and a description artifact is stored only if something survives — grounded on the same spans (extraction_method="llm_grounded", model_id+prompt_version in the key). Never on the offline kb index path.

What

  • kb.extract.semanticgrounding.validate_claims (deterministic, no model) + describe.describe_snapshot (orchestration; reuses kb.llm + write_grounded_artifact, which already enforces ≥1 span / is_deterministic/model_id).
  • queries.spans_for_artifact — span_id + fq_symbol_path + source text (validator input).
  • kb describe CLI (lazy LLM import, key-gated via has_llm_key).
  • MCP summarize + embed_text gain a description branch.
  • HARD gate semantic_grounding_test on a stub LLM (no API key): an adversarial fabricated claim is dropped, the grounded one stored, the description served as llm_grounded. Headline gates eight → nine — the DESIGN §9 semantic floor, enforced deterministically in CI.

Verification

  • ruff + mypy --strict clean; 54 eval tests pass (+1 skipped).
  • Stub end-to-end (ephemeral PG): descriptions inherit the route's cross-file grounding (routes.py + schemas.py); a route whose claims are all ungrounded (/internal) gets no description stored at all (anti-hallucination).
  • Real-LLM path is exercised by kb describe with a key (manual / nightly later); not required for CI.

Notes

kb index stays offline/deterministic (no API key). Out of scope: business-process/call-graph extraction, other description kinds, real-LLM in the CI gate. No release this cycle (accumulates in [Unreleased]).

…gate

First slice of the LLM-grounded layer (Roadmap). A separate, key-gated `kb
describe` pass has an LLM write a short NL summary + structured claims for each
api_route/entity in a snapshot; every claim is validated against the artifact's
own grounding spans by a DETERMINISTIC sub-property gate, unvalidated claims are
dropped, and a `description` artifact is stored only if something survives —
grounded on the same spans (extraction_method=llm_grounded, model_id +
prompt_version in the key). Never on the offline `kb index` path.

- kb/extract/semantic: grounding.validate_claims (deterministic, no LLM) +
  describe.describe_snapshot (orchestration, reuses kb.llm + write_grounded_artifact).
- queries.spans_for_artifact (span_id + fq_symbol_path + raw_text).
- CLI `kb describe` (lazy LLM import, key-gated via has_llm_key).
- MCP summarize + embed_text gain a `description` branch.
- HARD gate eval/semantic_grounding_test (stub LLM, no key): an adversarial
  fabricated claim is dropped, the grounded one stored, description served as
  llm_grounded. Headline gates eight -> nine.
- docs: README (Status, Quickstart, architecture, nine gates), DESIGN §11/§9
  (un-defer kb.extract.semantic, semantic floor implemented), CHANGELOG.

ruff + mypy --strict clean; 54 eval tests pass (+1 skipped). Stub end-to-end:
descriptions inherit cross-file grounding; an all-ungrounded route's description
is not stored at all.
@v0ropaev v0ropaev merged commit ca77d45 into master Jun 21, 2026
4 checks passed
@v0ropaev v0ropaev deleted the feat/llm-describe branch June 21, 2026 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant