Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- (nothing yet)
- **Deterministic entities extractor** (`kb.extract.deterministic.entities`): a fully static
(tree-sitter) extractor that emits one `entity` artifact per domain class — pydantic `BaseModel`,
`@dataclass`, and SQLAlchemy declarative model — with its fields, grounded on the class-definition
span. Detection signals and limits are recorded in the payload (transitive bases / imperative
SQLAlchemy mapping are documented gaps, not silent losses); `framework_versions` (pydantic /
sqlalchemy) is folded into the artifact key. Surfaced via MCP `get_knowledge`/`search_knowledge`.
- **Tier-1 entities gate** (`kb.eval.tier1_entities_test`): a hand-labeled HARD gate — extracted
entities + fields match the oracle, a bare declarative `Base` is not an entity, a `create_model(...)`
model is asserted as a known gap, and every entity is grounded on a `class` span. Brings the headline
HARD gates to **eight**.

## [0.2.0] - 2026-06-02

Expand Down
6 changes: 3 additions & 3 deletions DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ freshness(current|stale@sha)`, with a deterministic tie-break for reproducible e
| Module | Responsibility | Key tech |
|--------|----------------|----------|
| `kb.structural` | Parse Python without executing it; enumerate symbols/imports/call-sites with per-SHA byte/line ranges; compute content-addressed span identity; incremental reparse. Hidden behind a `StructuralIndex`/`PathEngine` interface so a SCIP backend can replace tree-sitter later. | tree-sitter + tree-sitter-python (canonical bindings) |
| `kb.extract.deterministic` | No-LLM extractors → exact artifacts (confidence=1.0): import graph; FastAPI API contract (static, cross-file grounded); griffe library surface (planned). | grimp, tree-sitter queries, griffe (static) |
| `kb.extract.deterministic` | No-LLM extractors → exact artifacts (confidence=1.0): import graph; FastAPI API contract (static, cross-file grounded); domain entities (pydantic/dataclass/SQLAlchemy, static, hand-labeled gate); griffe library surface (planned). | grimp, tree-sitter queries, griffe (static) |
| `kb.introspect` | Eval-only runtime oracle: runs a FastAPI app in a network-blocked sandbox and emits `app.openapi()` for the Tier-1 API gate. Never on the index path. | subprocess sandbox, fastapi |
| `kb.embed` | Replaceable embedding adapters + snapshot population for `search_knowledge`. Torch isolated behind the `embed` extra and a lazy import. | sentence-transformers (default), OpenAI (optional), pgvector |
| `kb.rag` | Frozen pgvector RAG-over-source baseline — the "other arm" of the knowledge-vs-RAG A/B (no provenance/grounding). | deterministic line-window chunker, pgvector |
Expand Down Expand Up @@ -382,8 +382,8 @@ Review fact-checked these against current (2026) sources. Caveats are first-clas

## 14. Roadmap (post-MVP, indicative)

1. Second deterministic family fully (entities via griffe/SQLAlchemy/pydantic; events where a
real oracle exists).
1. Second deterministic family: **entities (pydantic/dataclass/SQLAlchemy) — shipped** (static
tree-sitter, hand-labeled Tier-1 gate); events where a real oracle exists (next).
2. The **one** grounded business-process extractor (named real path + labeler + validator +
deterministic sub-property gate).
3. Recursive invalidation (`artifact_depends_on`), multi-branch dedup, freshness precompute.
Expand Down
20 changes: 11 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,12 @@ flowchart LR
**v0.2 — spine + the first knowledge extractors, MCP serving, and the knowledge-vs-RAG gate.** Everything here grounds what it claims, and nothing it cannot:

- **Provenance spine** — content-addressed `span_id` (LOCKED); tree-sitter spans with a normalized S-expression fingerprint and per-SHA location; a single-Postgres, Alembic-managed store with content-addressed idempotent writes; the ≥ 1 `derived_from` anti-hallucination invariant enforced in-app *and* by a deferred DB trigger; pygit2 git ingest (no checkout) with a diff-based invalidation seed.
- **Deterministic extractors** — the **import / dependency graph** (grimp resolves the edge, tree-sitter grounds it on the exact import statement, with an honest `approximate` fallback for re-exports / relative / unmappable imports — never a silent loss), and the **FastAPI API-contract** extractor, which grounds a single route **across files** (handler in `routes.py` + `response_model` class in `schemas.py`).
- **Deterministic extractors** — the **import / dependency graph** (grimp resolves the edge, tree-sitter grounds it on the exact import statement, with an honest `approximate` fallback for re-exports / relative / unmappable imports — never a silent loss); the **FastAPI API-contract** extractor, which grounds a single route **across files** (handler in `routes.py` + `response_model` class in `schemas.py`); and the **domain-entity** extractor (pydantic / dataclass / SQLAlchemy classes and their fields, grounded on the class definition — purely static, with documented detection limits).
- **`kb introspect`** — a sandboxed, network-blocked `app.openapi()` oracle, eval-only and never on the index path, that the API gate scores the static contract against.
- **Read-only MCP server** — `find_provenance`, `get_knowledge`, and `search_knowledge`, each returning provenance-carrying units (method + confidence + freshness).
- **pgvector embeddings + semantic search** — a replaceable embedding provider (sentence-transformers by default, OpenAI optional) populated by a separate `kb embed` pass; torch stays out of the index path.
- **A frozen RAG-over-source baseline** and the **Tier-3 knowledge-vs-RAG recall gate** — the honest A/B that backs the "knowledge > RAG" thesis.
- **Seven HARD CI eval gates** (see [Development](#development)).
- **Eight HARD CI eval gates** (see [Development](#development)).

**Not done yet** (and deliberately not faked): the semantic / **LLM-grounded** extraction layer, the nightly LLM-judged A/B, ADR mining from git history, grounded business-process extraction, incremental re-index on git push, and languages beyond Python. See the [Roadmap](#roadmap).

Expand All @@ -112,7 +112,7 @@ The base `--extra dev` install stays torch-free; the `embed` extra pulls sentenc
### Run the gates

```bash
uv run pytest src/kb/eval -q # the seven HARD gates (spins an ephemeral local Postgres)
uv run pytest src/kb/eval -q # the eight HARD gates (spins an ephemeral local Postgres)
```

### Index a commit
Expand Down Expand Up @@ -173,12 +173,13 @@ A Python package `kb` (uv, src-layout). Modules and their responsibilities:
| `kb.git` | pygit2 ingest — reads blobs at a SHA (no checkout) — plus the diff-based invalidation seed. |
| `kb.extract.deterministic.imports` | Deterministic import / dependency edges: tree-sitter spans grounded by line, grimp edge resolution. |
| `kb.extract.deterministic.fastapi_contract` | Static FastAPI API-contract extractor; grounds a route across files (handler + `response_model` class), never imports user code. |
| `kb.extract.deterministic.entities` | Static domain-entity extractor — pydantic / dataclass / SQLAlchemy classes + their fields, grounded on the class definition; detection signals and limits recorded in the payload. |
| `kb.introspect` | Sandboxed, network-blocked `app.openapi()` oracle — eval-only ground truth for the API gate, never on the index path. |
| `kb.mcp` | Read-only MCP server and its provenance-carrying records: `find_provenance`, `get_knowledge`, `search_knowledge`. |
| `kb.embed` | Replaceable embedding adapters (sentence-transformers default, OpenAI optional) + snapshot population. Torch isolated behind the `embed` extra and a lazy import. |
| `kb.rag` | The frozen pgvector RAG-over-source baseline — the "other arm" of the knowledge-vs-RAG A/B (no provenance, no grounding). |
| `kb.daemon.cli` | The `kb` CLI: `index`, `embed`, `serve` (MCP), and `introspect` — all functional. |
| `kb.eval` | Seven HARD CI gates (identity reproducibility, adversarial grounding, Tier-1 import oracle, Tier-1 API oracle, Tier-3 knowledge-vs-RAG recall, Tier-4 one-hop invalidation, invariants) plus the supporting MCP / embed / store suite. |
| `kb.eval` | Eight HARD CI gates (identity reproducibility, adversarial grounding, Tier-1 import oracle, Tier-1 API oracle, Tier-1 entities oracle, Tier-3 knowledge-vs-RAG recall, Tier-4 one-hop invalidation, invariants) plus the supporting MCP / embed / store suite. |

Core tables: `commit_ref`, `branch_ref`, `code_span`, `span_occurrence`, `artifact` (now with `embedding vector(384)` + `embedding_model_id`), `artifact_derived_from`, `snapshot_entry`, and `rag_chunk` (the baseline arm).

Expand All @@ -188,18 +189,19 @@ Core tables: `commit_ref`, `branch_ref`, `code_span`, `span_occurrence`, `artifa
uv sync --extra dev # venv + install
uv run ruff check src/kb # lint
uv run mypy # strict type-check
uv run pytest src/kb/eval -q # the seven HARD eval gates
uv run pytest src/kb/eval -q # the eight HARD eval gates
```

CI (GitHub Actions, workflow **"CI"**, `.github/workflows/ci.yml`) runs ruff, `mypy --strict`, and the eval gates against a `pgvector/pgvector:pg17` service (with the embedding model cached). The **seven HARD gates** that block a merge:
CI (GitHub Actions, workflow **"CI"**, `.github/workflows/ci.yml`) runs ruff, `mypy --strict`, and the eval gates against a `pgvector/pgvector:pg17` service (with the embedding model cached). The **eight HARD gates** that block a merge:

1. **Identity reproducibility** — formatting / comment / docstring / location changes must NOT change `span_id`; a rename MUST. Pure identity core, no database.
2. **Adversarial grounding** — an ungrounded artifact is rejected by *both* layers (the app's `GroundingError` and the DB's deferred `artifact_grounded_check` trigger); a genuinely grounded artifact commits cleanly.
3. **Tier-1 import oracle** — extracted import edges match a hand-labeled oracle, grounded on the actual import statement span; a dynamic import is asserted as a *known* gap, not a silent loss.
4. **Tier-1 API oracle** — the statically-extracted FastAPI contract equals the app's own `openapi()` (from the sandboxed introspect oracle), and the route's cross-file grounding (handler + `response_model`) is asserted.
5. **Tier-3 knowledge-vs-RAG recall** — knowbase cross-file recall@k == 1.0 for every contract question (a *structural* floor: one artifact already spans both files, so it holds regardless of embedding quality); the RAG arm is reported but **never asserted**, so a model bump can't redden CI.
6. **Tier-4 one-hop invalidation** — a content diff invalidates *exactly* the artifacts whose grounding span changed (set-equality: no over-invalidation, no stale survivors); a version bump invalidates everything.
7. **Invariants** — zero orphans (every snapshot artifact is grounded), and re-indexing the same SHA yields the identical set of artifact ids.
5. **Tier-1 entities oracle** — extracted pydantic / dataclass / SQLAlchemy entities + their fields match a hand-labeled oracle, each grounded on its class span; a bare declarative `Base` is correctly *not* an entity and a `create_model(...)` model is asserted as a *known* gap.
6. **Tier-3 knowledge-vs-RAG recall** — knowbase cross-file recall@k == 1.0 for every contract question (a *structural* floor: one artifact already spans both files, so it holds regardless of embedding quality); the RAG arm is reported but **never asserted**, so a model bump can't redden CI.
7. **Tier-4 one-hop invalidation** — a content diff invalidates *exactly* the artifacts whose grounding span changed (set-equality: no over-invalidation, no stale survivors); a version bump invalidates everything.
8. **Invariants** — zero orphans (every snapshot artifact is grounded), and re-indexing the same SHA yields the identical set of artifact ids.

The identity rules in `kb.ids` (and `kb.structural`) are **LOCKED**: changing one is a breaking change, gated behind a `NORMALIZATION_VERSION` / `extractor_version` bump so existing digests are invalidated rather than silently colliding.

Expand Down
5 changes: 4 additions & 1 deletion src/kb/daemon/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import typer

from kb.daemon.pipeline import index_commit
from kb.extract.deterministic.entities import EntityExtractor
from kb.extract.deterministic.fastapi_contract import FastAPIExtractor
from kb.extract.deterministic.imports import ImportExtractor
from kb.introspect import introspect_app
Expand All @@ -27,7 +28,9 @@ def index(
) -> None:
"""Index one commit: ingest, parse spans, run deterministic extractors, write the snapshot."""
engine = make_engine(db_url)
result = index_commit(engine, repo, sha, extractors=[ImportExtractor(), FastAPIExtractor()])
result = index_commit(
engine, repo, sha, extractors=[ImportExtractor(), FastAPIExtractor(), EntityExtractor()]
)
engine.dispose()
typer.echo(
f"indexed {result.sha[:12]}: {result.files_indexed} files, {result.spans} spans, "
Expand Down
134 changes: 134 additions & 0 deletions src/kb/eval/tier1_entities_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
"""HARD GATE — Tier 1: domain entities vs a hand-labeled oracle (DESIGN.md §4, §9).

The hand-labeled ``EXPECTED_ENTITIES`` / ``EXPECTED_FIELDS`` are the real oracle (importing the
models to introspect them would execute user code). A bare declarative ``Base`` must NOT be an
entity, and a dynamically-built model (``create_model``) is a deliberate static-analysis blind spot,
asserted as a KNOWN gap — not a silent loss. Every entity is grounded on its class-definition span.
"""

from __future__ import annotations

from pathlib import Path

from sqlalchemy import Engine, select

from kb.daemon.pipeline import index_commit
from kb.eval._fixtures import make_git_repo
from kb.extract.deterministic.entities import EntityExtractor
from kb.store import models as m

# A src-layout module: a pydantic model, a dataclass, a SQLAlchemy model (plus a bare declarative
# Base that is NOT an entity), and a dynamically-built model (invisible to static parsing).
FILES = {
"src/shop/__init__.py": "",
"src/shop/models.py": (
"from dataclasses import dataclass\n"
"from pydantic import BaseModel, create_model\n"
"from sqlalchemy import Column, Integer\n"
"from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column\n"
"\n\n"
"class Order(BaseModel):\n"
" id: int\n"
" total: float = 0.0\n"
" note: str | None = None\n"
"\n\n"
"@dataclass\n"
"class LineItem:\n"
" sku: str\n"
" qty: int = 1\n"
"\n\n"
"class Base(DeclarativeBase):\n"
" pass\n"
"\n\n"
"class User(Base):\n"
' __tablename__ = "users"\n'
" id: Mapped[int] = mapped_column(primary_key=True)\n"
" name: Mapped[str] = mapped_column()\n"
" legacy = Column(Integer)\n"
"\n\n"
'Dynamic = create_model("Dynamic", x=(int, ...))\n'
),
}

# Hand-labeled oracle: (framework, fq class). `Base` and `Dynamic` are deliberately absent.
EXPECTED_ENTITIES = {
("pydantic", "shop.models.Order"),
("dataclass", "shop.models.LineItem"),
("sqlalchemy", "shop.models.User"),
}
EXPECTED_FIELDS = {
"shop.models.Order": {"id", "total", "note"},
"shop.models.LineItem": {"sku", "qty"},
"shop.models.User": {"id", "name", "legacy"}, # __tablename__ is metadata, not a field
}
KNOWN_GAP = "shop.models.Dynamic" # create_model(): dynamic, invisible to static analysis


def _index(engine: Engine, tmp_path: Path) -> str:
sha = make_git_repo(tmp_path, [FILES])[0]
index_commit(engine, str(tmp_path), sha, extractors=[EntityExtractor()], first_party_root="src")
return sha


def _entity_payloads(engine: Engine, sha: str) -> list[dict]:
join = m.snapshot_entry.join(
m.artifact, m.artifact.c.artifact_id == m.snapshot_entry.c.artifact_id
)
with engine.connect() as conn:
return list(
conn.execute(
select(m.artifact.c.payload)
.select_from(join)
.where(m.snapshot_entry.c.sha == sha, m.artifact.c.kind == "entity")
).scalars()
)


def test_entities_match_oracle(engine: Engine, tmp_path: Path) -> None:
sha = _index(engine, tmp_path)
found = {(p["framework"], p["qualified_name"]) for p in _entity_payloads(engine, sha)}
assert found == EXPECTED_ENTITIES


def test_fields_match_oracle(engine: Engine, tmp_path: Path) -> None:
sha = _index(engine, tmp_path)
by_key = {p["qualified_name"]: p for p in _entity_payloads(engine, sha)}
for qualified_name, expected in EXPECTED_FIELDS.items():
names = {f["name"] for f in by_key[qualified_name]["fields"]}
assert names == expected, qualified_name


def test_bare_declarative_base_is_not_an_entity(engine: Engine, tmp_path: Path) -> None:
sha = _index(engine, tmp_path)
keys = {p["qualified_name"] for p in _entity_payloads(engine, sha)}
assert "shop.models.Base" not in keys # no __tablename__, no columns -> not a domain entity


def test_dynamic_model_is_a_known_gap(engine: Engine, tmp_path: Path) -> None:
sha = _index(engine, tmp_path)
keys = {p["qualified_name"] for p in _entity_payloads(engine, sha)}
assert KNOWN_GAP not in keys # documented blind spot, surfaced — not silently "found"


def test_entities_grounded_on_class_spans(engine: Engine, tmp_path: Path) -> None:
sha = _index(engine, tmp_path)
join = (
m.snapshot_entry.join(
m.artifact, m.artifact.c.artifact_id == m.snapshot_entry.c.artifact_id
)
.join(
m.artifact_derived_from,
m.artifact_derived_from.c.artifact_id == m.artifact.c.artifact_id,
)
.join(m.code_span, m.code_span.c.span_id == m.artifact_derived_from.c.span_id)
)
with engine.connect() as conn:
rows = conn.execute(
select(m.artifact.c.payload, m.code_span.c.span_kind)
.select_from(join)
.where(m.snapshot_entry.c.sha == sha, m.artifact.c.kind == "entity")
).all()
assert rows # every entity is grounded (>=1 derived_from)
for row in rows:
assert row.span_kind == "class"
assert row.payload["span_mapping"] == "exact"
Loading
Loading