v0ropaev · v0ropaev · Jun 21, 2026 · Jun 21, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,7 +9,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
-- (nothing yet)
+- **LLM-grounded semantic layer — first slice** (`kb.extract.semantic`, `kb describe`): an optional,
+  key-gated pass (separate from `kb index`) has an LLM write a short NL summary + structured claims
+  for each `api_route` / `entity` artifact in a snapshot. Every claim is validated against the
+  artifact's own grounding spans by a **deterministic sub-property gate**
+  (`grounding.validate_claims`) — claims citing a symbol not in the code are dropped, and a
+  `description` artifact is stored only if something survives, grounded on the same spans
+  (`extraction_method = "llm_grounded"`, `model_id` + `prompt_version` in the artifact key). Surfaced
+  via MCP `get_knowledge` / `search_knowledge`. Uses `kb.llm` (Anthropic default, OpenAI optional).
+- **Semantic grounding HARD gate** (`kb.eval.semantic_grounding_test`): runs the describer on a
+  **stub** LLM (no API key) and asserts an adversarial fabricated claim is dropped while the grounded
+  claim is stored — the DESIGN §9 semantic floor, enforced deterministically in CI. Headline HARD
+  gates: eight → **nine**.
 
 ## [0.3.0] - 2026-06-21
 

diff --git a/DESIGN.md b/DESIGN.md
@@ -285,6 +285,12 @@ rejected. **Verbalized LLM confidence is never used as the score.**
 > in a process summary is a real sink-registry match on the path; path endpoints are real
 > entrypoints/sinks. Confidence must honestly count *unknown-unknowns* (edges never discovered
 > by the ~70%-recall call-graph engine), not only "unresolved on the path it found".
+>
+> *Implemented (first slice):* the `kb describe` describer enforces this floor —
+> `kb.extract.semantic.grounding.validate_claims` drops any claim whose cited symbol is absent from
+> the artifact's grounding spans; an artifact with no surviving claim is not stored. The gate is
+> deterministic, so `semantic_grounding_test` enforces it in CI (stub LLM, no API key), including an
+> adversarial fabricated claim that must be dropped.
 
 ---
 
@@ -319,7 +325,7 @@ freshness(current|stale@sha)`, with a deterministic tie-break for reproducible e
 | `kb.eval` | Tiered eval; deterministic tiers gate CI. | pytest over SHA-pinned golden repos |
 | `kb.mcp` | Read-only MCP server; provenance-carrying records; budget-aware assembly. | FastMCP (pinned), Pydantic models |
 | `kb.daemon` | Orchestration + CLI: index a repo @ SHA, run extractors in order, write snapshot, host MCP. | typer |
-| `kb.extract.semantic` *(deferred)* | The one grounded business-process extractor: entrypoints → call-graph slice → sinks → LLM labeler → span-binding validator. | tree-sitter queries, `PathEngine` (call-graph), YAML sink registry, thin LLM adapter |
+| `kb.extract.semantic` | **First slice shipped:** `kb describe` — LLM-grounded NL descriptions of routes/entities, each claim validated against the artifact's spans by a deterministic sub-property gate (`grounding.validate_claims`); separate key-gated pass, never on `index`. *Deferred:* the grounded business-process extractor (entrypoints → call-graph slice → sinks → LLM labeler → span-binding validator). | thin LLM adapter (`kb.llm`); later: `PathEngine` (call-graph), YAML sink registry |
 
 ---
 

diff --git a/README.md b/README.md
@@ -89,7 +89,8 @@ flowchart LR
 - **Read-only MCP server** — `find_provenance`, `get_knowledge`, and `search_knowledge`, each returning provenance-carrying units (method + confidence + freshness).
 - **pgvector embeddings + semantic search** — a replaceable embedding provider (sentence-transformers by default, OpenAI optional) populated by a separate `kb embed` pass; torch stays out of the index path.
 - **A frozen RAG-over-source baseline** and the **Tier-3 knowledge-vs-RAG recall gate** — the honest A/B that backs the "knowledge > RAG" thesis.
-- **Eight HARD CI eval gates** (see [Development](#development)).
+- **LLM-grounded descriptions** — an optional, key-gated `kb describe` pass has an LLM write NL summaries for routes/entities; every claim is validated against the artifact's own spans by a deterministic sub-property gate, so ungrounded claims are *dropped* (the anti-hallucination invariant, with a model in the loop). Stored as `extraction_method = "llm_grounded"`, grounded on the same spans.
+- **Nine HARD CI eval gates** (see [Development](#development)).
 
 - **A nightly LLM-judged A/B** (optional, key-gated, **non-gating**) — an answerer LLM answers each question from knowbase's grounded context vs a RAG-over-source context, and a judge LLM scores **answer accuracy** (against hand-written gold) + **hallucination**. Tracked metrics on top of recall; it never blocks CI.
 
@@ -115,7 +116,7 @@ The base `--extra dev` install stays torch-free; the `embed` extra pulls sentenc
 ### Run the gates
 
 ```bash
-uv run pytest src/kb/eval -q   # the eight HARD gates (spins an ephemeral local Postgres)
+uv run pytest src/kb/eval -q   # the nine HARD gates (spins an ephemeral local Postgres)
 ```
 
 ### Index a commit
@@ -142,6 +143,14 @@ uv run kb embed --db-url <postgres-url>   # separate pass: populate artifact emb
 
 `kb embed` runs a replaceable embedding provider (sentence-transformers `all-MiniLM-L6-v2` by default, OpenAI optional via `KB_EMBED_PROVIDER=openai`) over the latest snapshot's artifacts and writes them into `artifact.embedding` (pgvector). It is idempotent and torch only loads when this command runs — never on the index path.
 
+### Generate semantic descriptions (LLM-grounded)
+
+```bash
+uv run kb describe --db-url <postgres-url>   # separate, key-gated pass (ANTHROPIC_API_KEY / OPENAI_API_KEY)
+```
+
+`kb describe` has an LLM (via `kb.llm`, `KB_LLM_PROVIDER` in {`anthropic`,`openai`}) write a short NL summary + structured claims for each route/entity in the latest snapshot. **Every claim is validated against that artifact's own grounding spans** — claims citing a symbol not in the code are dropped, and a `description` artifact is stored only if something survives, grounded on the same spans (`extraction_method = "llm_grounded"`). It needs an API key, never runs on `kb index`, and the deterministic grounding gate is exercised in CI without a key (stub LLM).
+
 ### Serve to an AI agent (MCP)
 
 ```bash
@@ -209,8 +218,9 @@ A Python package `kb` (uv, src-layout). Modules and their responsibilities:
 | `kb.mcp` | Read-only MCP server and its provenance-carrying records: `find_provenance`, `get_knowledge`, `search_knowledge`. |
 | `kb.embed` | Replaceable embedding adapters (sentence-transformers default, OpenAI optional) + snapshot population. Torch isolated behind the `embed` extra and a lazy import. |
 | `kb.rag` | The frozen pgvector RAG-over-source baseline — the "other arm" of the knowledge-vs-RAG A/B (no provenance, no grounding). |
-| `kb.daemon.cli` | The `kb` CLI: `index`, `migrate`, `embed`, `serve` (MCP), and `introspect` — all functional. |
-| `kb.eval` | Eight HARD CI gates (identity reproducibility, adversarial grounding, Tier-1 import oracle, Tier-1 API oracle, Tier-1 entities oracle, Tier-3 knowledge-vs-RAG recall, Tier-4 one-hop invalidation, invariants) plus the supporting MCP / embed / store suite. |
+| `kb.extract.semantic` | LLM-grounded extraction (`kb describe`): NL descriptions of routes/entities with a deterministic sub-property gate (`grounding.validate_claims`) that drops any claim not backed by the artifact's spans. Separate key-gated pass; never on `index`. |
+| `kb.daemon.cli` | The `kb` CLI: `index`, `migrate`, `embed`, `describe`, `serve` (MCP), and `introspect` — all functional. |
+| `kb.eval` | Nine HARD CI gates (identity reproducibility, adversarial grounding, Tier-1 import oracle, Tier-1 API oracle, Tier-1 entities oracle, Tier-3 knowledge-vs-RAG recall, Tier-4 one-hop invalidation, invariants, semantic grounding floor) plus the supporting MCP / embed / store suite. |
 
 Core tables: `commit_ref`, `branch_ref`, `code_span`, `span_occurrence`, `artifact` (now with `embedding vector(384)` + `embedding_model_id`), `artifact_derived_from`, `snapshot_entry`, and `rag_chunk` (the baseline arm).
 
@@ -220,10 +230,10 @@ Core tables: `commit_ref`, `branch_ref`, `code_span`, `span_occurrence`, `artifa
 uv sync --extra dev            # venv + install
 uv run ruff check src/kb       # lint
 uv run mypy                    # strict type-check
-uv run pytest src/kb/eval -q   # the eight HARD eval gates
+uv run pytest src/kb/eval -q   # the nine HARD eval gates
 ```
 
-CI (GitHub Actions, workflow **"CI"**, `.github/workflows/ci.yml`) runs ruff, `mypy --strict`, and the eval gates against a `pgvector/pgvector:pg17` service (with the embedding model cached). The **eight HARD gates** that block a merge:
+CI (GitHub Actions, workflow **"CI"**, `.github/workflows/ci.yml`) runs ruff, `mypy --strict`, and the eval gates against a `pgvector/pgvector:pg17` service (with the embedding model cached). The **nine HARD gates** that block a merge:
 
 1. **Identity reproducibility** — formatting / comment / docstring / location changes must NOT change `span_id`; a rename MUST. Pure identity core, no database.
 2. **Adversarial grounding** — an ungrounded artifact is rejected by *both* layers (the app's `GroundingError` and the DB's deferred `artifact_grounded_check` trigger); a genuinely grounded artifact commits cleanly.
@@ -233,6 +243,7 @@ CI (GitHub Actions, workflow **"CI"**, `.github/workflows/ci.yml`) runs ruff, `m
 6. **Tier-3 knowledge-vs-RAG recall** — knowbase cross-file recall@k == 1.0 for every cross-file question (API contracts **and** domain entities: in each case one artifact already spans both files, so the floor is *structural*, independent of embedding quality); the RAG arm is reported but **never asserted**, so a model bump can't redden CI.
 7. **Tier-4 one-hop invalidation** — a content diff invalidates *exactly* the artifacts whose grounding span changed (set-equality: no over-invalidation, no stale survivors); a version bump invalidates everything.
 8. **Invariants** — zero orphans (every snapshot artifact is grounded), and re-indexing the same SHA yields the identical set of artifact ids.
+9. **Semantic grounding floor** — the LLM-grounded describer's claims are validated against the artifact's own spans by a deterministic sub-property gate; an adversarial fabricated claim is *dropped*, never stored (run on a stub LLM, so it gates without an API key).
 
 The identity rules in `kb.ids` (and `kb.structural`) are **LOCKED**: changing one is a breaking change, gated behind a `NORMALIZATION_VERSION` / `extractor_version` bump so existing digests are invalidated rather than silently colliding.
 

diff --git a/src/kb/daemon/cli.py b/src/kb/daemon/cli.py
@@ -77,6 +77,36 @@ def embed(
         engine.dispose()
 
 
+@app.command()
+def describe(
+    db_url: str | None = typer.Option(None, "--db-url", help="Postgres URL (else KB_DB_URL env)."),
+    sha: str | None = typer.Option(None, "--sha", help="Snapshot sha to describe (def: latest)."),
+) -> None:
+    """LLM-grounded NL descriptions for a snapshot (separate key-gated pass; never on `index`)."""
+    from kb.extract.semantic.describe import describe_snapshot  # lazy: keeps the LLM off other cmds
+    from kb.llm.providers import default_llm_provider, has_llm_key
+    from kb.store.queries import latest_ingested_sha
+
+    if not has_llm_key():
+        typer.echo("no LLM API key (set ANTHROPIC_API_KEY or OPENAI_API_KEY)")
+        raise typer.Exit(code=1)
+    engine = make_engine(db_url)
+    try:
+        with engine.connect() as conn:
+            target = sha or latest_ingested_sha(conn)
+        if target is None:
+            typer.echo("no snapshot to describe")
+            raise typer.Exit(code=1)
+        provider = default_llm_provider()
+        result = describe_snapshot(engine, target, provider)
+        typer.echo(
+            f"described {result.described} artifacts (dropped {result.dropped_claims} claims) "
+            f"@ {target[:12]} with {provider.model_id}"
+        )
+    finally:
+        engine.dispose()
+
+
 @app.command()
 def serve(
     db_url: str | None = typer.Option(None, "--db-url", help="Postgres URL (else KB_DB_URL env)."),

diff --git a/src/kb/embed/text.py b/src/kb/embed/text.py
@@ -35,4 +35,7 @@ def embed_text(kind: str, payload: dict[str, Any]) -> str:
             + " ".join(str(r.get("name", "")) for r in payload.get("related_entities", [])),
         ]
         return " ".join(p for p in parts if p.strip())
+    if kind == "description":
+        claims = " ".join(str(c.get("text", "")) for c in payload.get("claims", []))
+        return f"{payload.get('summary', '')} {claims}".strip() or head
     return head
diff --git a/src/kb/eval/semantic_grounding_test.py b/src/kb/eval/semantic_grounding_test.py
@@ -0,0 +1,91 @@
+"""HARD GATE — semantic floor (DESIGN.md §9): LLM-grounded claims are span-validated.
+
+Uses a STUB LLM provider (fixed output: one real symbol + one fabricated one), so the
+anti-hallucination invariant of the LLM layer is enforced **deterministically and without an API
+key** — it gates in normal CI. The describer must store only the grounded claim and drop the
+fabricated one; the description is grounded (role `describes`) on its target's spans and served as
+`llm_grounded`.
+"""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+from sqlalchemy import Engine, select
+
+from kb.daemon.pipeline import index_commit
+from kb.eval._fixtures import make_git_repo
+from kb.eval.tier1_api_test import FILES
+from kb.extract.deterministic.fastapi_contract import FastAPIExtractor
+from kb.extract.semantic.describe import describe_snapshot
+from kb.extract.semantic.grounding import validate_claims
+from kb.store import models as m
+from kb.store.queries import provenance_for_artifact
+
+REAL = "OrderOut"  # appears in the fixture (schemas.py + the routes' response_model)
+FAKE = "nonexistent_symbol_xyz"  # appears nowhere -> must be dropped as a hallucination
+
+
+class _StubProvider:
+    """Deterministic stand-in for an LLMProvider: always returns one real + one fabricated claim."""
+
+    model_id = "stub:describe-test"
+
+    def complete(self, system: str, user: str, *, max_tokens: int = 1024) -> str:
+        return json.dumps(
+            {
+                "summary": "Stub description.",
+                "claims": [
+                    {"text": f"returns {REAL}", "symbol": REAL},
+                    {"text": "calls a fabricated helper", "symbol": FAKE},
+                ],
+            }
+        )
+
+
+def _index(engine: Engine, tmp_path: Path) -> str:
+    sha = make_git_repo(tmp_path, [FILES])[0]
+    index_commit(
+        engine, str(tmp_path), sha, extractors=[FastAPIExtractor()], first_party_root="src"
+    )
+    return sha
+
+
+def test_validator_drops_fabricated_symbol() -> None:
+    claims = [{"text": "a", "symbol": REAL}, {"text": "b", "symbol": FAKE}]
+    kept, dropped = validate_claims(
+        claims, ["class OrderOut(BaseModel):\n    id: int\n"], ["app.schemas.OrderOut"]
+    )
+    assert [c["symbol"] for c in kept] == [REAL]
+    assert [c["symbol"] for c in dropped] == [FAKE]
+
+
+def test_describe_stores_only_grounded_claims(engine: Engine, tmp_path: Path) -> None:
+    sha = _index(engine, tmp_path)
+    result = describe_snapshot(engine, sha, _StubProvider())
+
+    assert result.described > 0
+    assert result.dropped_claims > 0  # the fabricated claim was dropped on every artifact
+
+    join = m.snapshot_entry.join(
+        m.artifact, m.artifact.c.artifact_id == m.snapshot_entry.c.artifact_id
+    )
+    with engine.connect() as conn:
+        rows = conn.execute(
+            select(
+                m.artifact.c.logical_key,
+                m.artifact.c.payload,
+                m.artifact.c.is_deterministic,
+            )
+            .select_from(join)
+            .where(m.snapshot_entry.c.sha == sha, m.artifact.c.kind == "description")
+        ).all()
+        assert rows
+        for row in rows:
+            symbols = [c["symbol"] for c in row.payload["claims"]]
+            assert REAL in symbols  # the grounded claim survives
+            assert FAKE not in symbols  # adversarial: the hallucinated claim is never stored
+            assert row.is_deterministic is False  # surfaced as llm_grounded
+            prov_files = {p.file_path for p in provenance_for_artifact(conn, sha, row.logical_key)}
+            assert prov_files  # grounded on its target's spans (>= 1 file)
diff --git a/src/kb/extract/semantic/__init__.py b/src/kb/extract/semantic/__init__.py
@@ -0,0 +1,7 @@
+"""LLM-grounded semantic extraction (DESIGN.md §4, §9) — the first model-backed knowledge layer.
+
+Runs as a separate, key-gated pass (``kb describe``), never on the deterministic ``kb index`` path.
+Every produced claim is validated against the artifact's own grounding spans by a deterministic
+sub-property gate (``grounding.validate_claims``); unvalidated claims are dropped — the
+anti-hallucination invariant, enforced without a model in the loop so it is gateable in CI.
+"""