Skip to content

feat: per-module LLM-grounded descriptions (kb describe)#15

Merged
v0ropaev merged 1 commit into
masterfrom
feat/llm-grounded-module-descriptions
Jun 22, 2026
Merged

feat: per-module LLM-grounded descriptions (kb describe)#15
v0ropaev merged 1 commit into
masterfrom
feat/llm-grounded-module-descriptions

Conversation

@v0ropaev

Copy link
Copy Markdown
Owner

What

Second slice of the LLM-grounded semantic layer: the key-gated kb describe pass now also describes each first-party module (file), not just api_route / entity artifacts.

  • store.queries.module_targets(conn, sha) + ModuleTarget — a module is not an artifact, so it is enumerated from its span occurrences at the snapshot SHA (first-party-only, since the pipeline indexes only files under the first-party root). The module name is the fq path of the file's module span; the target carries all of the file's spans (module + classes/functions/imports).
  • describe.py refactor — extracted a shared _describe_one(...) reused by the existing artifact loop and a new module loop. Module descriptions use target_kind="module" and logical key desc:module:<fqname>, grounded on all of the file's spans (role describes). The prompt source body is capped (~6000 chars) while validation still runs over every span.
  • Anti-hallucination unchanged — the same deterministic sub-property gate (grounding.validate_claims) drops any claim whose cited symbol does not occur in the file's spans; a module is described only if ≥1 real symbol survives.

Gate

kb.eval.semantic_grounding_test is extended with the module path (run on a stub LLM, no API key): a module where the real symbol occurs is described with the fabricated symbol dropped; a module with no matching symbol (e.g. app.main, app.__init__) gets no description. Headline HARD gate count stays nine (extended the existing gate, no new gate file).

Out of scope

Per-package / whole-repo architecture overviews; business-process / call-graph extraction; real-LLM describe in the CI gate (nightly only). No release in this cycle — accumulates in [Unreleased].

Checks

  • ruff check src/kb + mypy clean (65 files)
  • pytest — 55 passed, 1 skipped (key-gated LLM judge)
  • kb index stays offline; LLM only in kb describe

Extend the key-gated `kb describe` pass to describe each first-party
module (file), not just `api_route` / `entity` artifacts.

A module is not an artifact, so it is enumerated from its span
occurrences at the snapshot SHA (`store.queries.module_targets` ->
`ModuleTarget`) and grounded on ALL of the file's spans (module + its
classes/functions/imports). `describe.py` is refactored to a shared
`_describe_one(...)` reused by the artifact loop and a new module loop;
module descriptions use `target_kind="module"` and logical key
`desc:module:<fqname>`. The prompt source body is capped while
validation still runs over every span.

No new invariants: the same deterministic sub-property gate
(`grounding.validate_claims`) drops any claim whose cited symbol does
not occur in the file's spans, so a module is described only if a real
symbol survives. The `semantic_grounding` HARD gate is extended with the
module path (adversarial fabricated claim dropped; a module with no
matching symbol gets no description). Headline gate count stays nine.

README / DESIGN / CHANGELOG updated; `kb index` stays offline.
@v0ropaev v0ropaev merged commit 730f2e4 into master Jun 22, 2026
4 checks passed
@v0ropaev v0ropaev deleted the feat/llm-grounded-module-descriptions branch June 22, 2026 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant