Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 12 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **Deterministic entities extractor** (`kb.extract.deterministic.entities`): a fully static
(tree-sitter) extractor that emits one `entity` artifact per domain class — pydantic `BaseModel`,
`@dataclass`, and SQLAlchemy declarative model — with its fields, grounded on the class-definition
span. Detection signals and limits are recorded in the payload (transitive bases / imperative
SQLAlchemy mapping are documented gaps, not silent losses); `framework_versions` (pydantic /
sqlalchemy) is folded into the artifact key. Surfaced via MCP `get_knowledge`/`search_knowledge`.
span **and, across files, on the first-party entities it references** (resolved from field-type
annotations and SQLAlchemy `relationship()` targets; role `related_entity`). One `entity:Order`
artifact then spans every file it depends on — the cross-file shape RAG-over-chunks misses.
Detection signals and limits are recorded in the payload (transitive bases, imperative SQLAlchemy
mapping, and `ForeignKey("table.col")` resolution are documented gaps, not silent losses);
`framework_versions` (pydantic / sqlalchemy) is folded into the artifact key. Surfaced via MCP
`get_knowledge`/`search_knowledge` (entity embed text enriched with field + related-entity names).
- **Tier-1 entities gate** (`kb.eval.tier1_entities_test`): a hand-labeled HARD gate — extracted
entities + fields match the oracle, a bare declarative `Base` is not an entity, a `create_model(...)`
model is asserted as a known gap, and every entity is grounded on a `class` span. Brings the headline
HARD gates to **eight**.
model is asserted as a known gap, every entity is grounded on a `class` span, and a cross-file
reference (`Cart` → `Order`) is grounded on both files. Brings the headline HARD gates to **eight**.
- **Tier-3 entity questions** (`kb.eval.questions`): the knowledge-vs-RAG A/B now also covers domain
entities (a two-file `Order`/`LineItem` fixture), asserting knowbase cross-file recall@k == 1.0 for
entity questions as well as API-contract questions.

## [0.2.0] - 2026-06-02

Expand Down
2 changes: 1 addition & 1 deletion DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ freshness(current|stale@sha)`, with a deterministic tie-break for reproducible e
| Module | Responsibility | Key tech |
|--------|----------------|----------|
| `kb.structural` | Parse Python without executing it; enumerate symbols/imports/call-sites with per-SHA byte/line ranges; compute content-addressed span identity; incremental reparse. Hidden behind a `StructuralIndex`/`PathEngine` interface so a SCIP backend can replace tree-sitter later. | tree-sitter + tree-sitter-python (canonical bindings) |
| `kb.extract.deterministic` | No-LLM extractors → exact artifacts (confidence=1.0): import graph; FastAPI API contract (static, cross-file grounded); domain entities (pydantic/dataclass/SQLAlchemy, static, hand-labeled gate); griffe library surface (planned). | grimp, tree-sitter queries, griffe (static) |
| `kb.extract.deterministic` | No-LLM extractors → exact artifacts (confidence=1.0): import graph; FastAPI API contract (static, cross-file grounded); domain entities (pydantic/dataclass/SQLAlchemy, static, cross-file links to referenced entities, hand-labeled gate); griffe library surface (planned). | grimp, tree-sitter queries, griffe (static) |
| `kb.introspect` | Eval-only runtime oracle: runs a FastAPI app in a network-blocked sandbox and emits `app.openapi()` for the Tier-1 API gate. Never on the index path. | subprocess sandbox, fastapi |
| `kb.embed` | Replaceable embedding adapters + snapshot population for `search_knowledge`. Torch isolated behind the `embed` extra and a lazy import. | sentence-transformers (default), OpenAI (optional), pgvector |
| `kb.rag` | Frozen pgvector RAG-over-source baseline — the "other arm" of the knowledge-vs-RAG A/B (no provenance/grounding). | deterministic line-window chunker, pgvector |
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ flowchart LR
**v0.2 — spine + the first knowledge extractors, MCP serving, and the knowledge-vs-RAG gate.** Everything here grounds what it claims, and nothing it cannot:

- **Provenance spine** — content-addressed `span_id` (LOCKED); tree-sitter spans with a normalized S-expression fingerprint and per-SHA location; a single-Postgres, Alembic-managed store with content-addressed idempotent writes; the ≥ 1 `derived_from` anti-hallucination invariant enforced in-app *and* by a deferred DB trigger; pygit2 git ingest (no checkout) with a diff-based invalidation seed.
- **Deterministic extractors** — the **import / dependency graph** (grimp resolves the edge, tree-sitter grounds it on the exact import statement, with an honest `approximate` fallback for re-exports / relative / unmappable imports — never a silent loss); the **FastAPI API-contract** extractor, which grounds a single route **across files** (handler in `routes.py` + `response_model` class in `schemas.py`); and the **domain-entity** extractor (pydantic / dataclass / SQLAlchemy classes and their fields, grounded on the class definition — purely static, with documented detection limits).
- **Deterministic extractors** — the **import / dependency graph** (grimp resolves the edge, tree-sitter grounds it on the exact import statement, with an honest `approximate` fallback for re-exports / relative / unmappable imports — never a silent loss); the **FastAPI API-contract** extractor, which grounds a single route **across files** (handler in `routes.py` + `response_model` class in `schemas.py`); and the **domain-entity** extractor (pydantic / dataclass / SQLAlchemy classes and their fields, grounded on the class definition **and cross-file on the entities they reference** — purely static, with documented detection limits).
- **`kb introspect`** — a sandboxed, network-blocked `app.openapi()` oracle, eval-only and never on the index path, that the API gate scores the static contract against.
- **Read-only MCP server** — `find_provenance`, `get_knowledge`, and `search_knowledge`, each returning provenance-carrying units (method + confidence + freshness).
- **pgvector embeddings + semantic search** — a replaceable embedding provider (sentence-transformers by default, OpenAI optional) populated by a separate `kb embed` pass; torch stays out of the index path.
Expand Down Expand Up @@ -173,7 +173,7 @@ A Python package `kb` (uv, src-layout). Modules and their responsibilities:
| `kb.git` | pygit2 ingest — reads blobs at a SHA (no checkout) — plus the diff-based invalidation seed. |
| `kb.extract.deterministic.imports` | Deterministic import / dependency edges: tree-sitter spans grounded by line, grimp edge resolution. |
| `kb.extract.deterministic.fastapi_contract` | Static FastAPI API-contract extractor; grounds a route across files (handler + `response_model` class), never imports user code. |
| `kb.extract.deterministic.entities` | Static domain-entity extractor — pydantic / dataclass / SQLAlchemy classes + their fields, grounded on the class definition; detection signals and limits recorded in the payload. |
| `kb.extract.deterministic.entities` | Static domain-entity extractor — pydantic / dataclass / SQLAlchemy classes + their fields, grounded on the class definition **and, across files, on the entities they reference** (field types + `relationship()`); detection signals and limits recorded in the payload. |
| `kb.introspect` | Sandboxed, network-blocked `app.openapi()` oracle — eval-only ground truth for the API gate, never on the index path. |
| `kb.mcp` | Read-only MCP server and its provenance-carrying records: `find_provenance`, `get_knowledge`, `search_knowledge`. |
| `kb.embed` | Replaceable embedding adapters (sentence-transformers default, OpenAI optional) + snapshot population. Torch isolated behind the `embed` extra and a lazy import. |
Expand All @@ -199,7 +199,7 @@ CI (GitHub Actions, workflow **"CI"**, `.github/workflows/ci.yml`) runs ruff, `m
3. **Tier-1 import oracle** — extracted import edges match a hand-labeled oracle, grounded on the actual import statement span; a dynamic import is asserted as a *known* gap, not a silent loss.
4. **Tier-1 API oracle** — the statically-extracted FastAPI contract equals the app's own `openapi()` (from the sandboxed introspect oracle), and the route's cross-file grounding (handler + `response_model`) is asserted.
5. **Tier-1 entities oracle** — extracted pydantic / dataclass / SQLAlchemy entities + their fields match a hand-labeled oracle, each grounded on its class span; a bare declarative `Base` is correctly *not* an entity and a `create_model(...)` model is asserted as a *known* gap.
6. **Tier-3 knowledge-vs-RAG recall** — knowbase cross-file recall@k == 1.0 for every contract question (a *structural* floor: one artifact already spans both files, so it holds regardless of embedding quality); the RAG arm is reported but **never asserted**, so a model bump can't redden CI.
6. **Tier-3 knowledge-vs-RAG recall** — knowbase cross-file recall@k == 1.0 for every cross-file question (API contracts **and** domain entities: in each case one artifact already spans both files, so the floor is *structural*, independent of embedding quality); the RAG arm is reported but **never asserted**, so a model bump can't redden CI.
7. **Tier-4 one-hop invalidation** — a content diff invalidates *exactly* the artifacts whose grounding span changed (set-equality: no over-invalidation, no stale survivors); a version bump invalidates everything.
8. **Invariants** — zero orphans (every snapshot artifact is grounded), and re-indexing the same SHA yields the identical set of artifact ids.

Expand Down
10 changes: 10 additions & 0 deletions src/kb/embed/text.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,14 @@ def embed_text(kind: str, payload: dict[str, Any]) -> str:
return " ".join(p for p in parts if p.strip())
if kind == "import_edge":
return f"{head} import {payload.get('importer', '')} {payload.get('imported', '')}"
if kind == "entity":
parts = [
head,
f"entity {payload.get('qualified_name', '')}",
f"framework {payload.get('framework', '')}",
"fields " + " ".join(str(f.get("name", "")) for f in payload.get("fields", [])),
"related "
+ " ".join(str(r.get("name", "")) for r in payload.get("related_entities", [])),
]
return " ".join(p for p in parts if p.strip())
return head
43 changes: 39 additions & 4 deletions src/kb/eval/questions.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
"""Cross-file-contract questions for the knowledge-vs-RAG comparison (DESIGN.md §9, §10).
"""Cross-file questions for the knowledge-vs-RAG comparison (DESIGN.md §9, §10).

Every expected answer spans `src/app/routes.py` (the route/handler) AND `src/app/schemas.py` (the
pydantic model) of the Tier-1 FastAPI fixture — the case RAG-over-chunks fumbles. Reused by the
deterministic gate (PR-3a) and the nightly LLM A/B (PR-3b).
Two families, both spanning two files — the case RAG-over-chunks fumbles while a single grounded
knowbase artifact already covers both:
* **API contracts** — `src/app/routes.py` (route/handler) + `src/app/schemas.py` (response model),
from the Tier-1 FastAPI fixture (`FILES`).
* **Domain entities** — `src/app/domain/order.py` (the `Order` entity) +
`src/app/domain/line_item.py` (the `LineItem` it references), from `ENTITY_FILES` below.
Reused by the deterministic Tier-3 gate (PR-3a) and the nightly LLM A/B (PR-3b).
"""

from __future__ import annotations
Expand All @@ -13,6 +17,30 @@
SCHEMAS = "src/app/schemas.py"
CROSS_FILE = frozenset({ROUTES, SCHEMAS})

ORDER_ENTITY = "src/app/domain/order.py"
LINE_ITEM_ENTITY = "src/app/domain/line_item.py"
ENTITY_CROSS_FILE = frozenset({ORDER_ENTITY, LINE_ITEM_ENTITY})

# A two-file entity fixture: Order references LineItem across files (the cross-file link).
ENTITY_FILES = {
"src/app/domain/__init__.py": "",
"src/app/domain/line_item.py": (
"from dataclasses import dataclass\n\n\n"
"@dataclass\n"
"class LineItem:\n"
" sku: str\n"
" qty: int = 1\n"
),
"src/app/domain/order.py": (
"from dataclasses import dataclass\n"
"from app.domain.line_item import LineItem\n\n\n"
"@dataclass\n"
"class Order:\n"
" id: int\n"
" items: list[LineItem]\n"
),
}


@dataclass(frozen=True)
class Question:
Expand All @@ -39,4 +67,11 @@ class Question:
CROSS_FILE, frozenset({"api:GET /api/orders"})),
Question("q8", "Which endpoint returns OrderOut and where is that model defined?",
CROSS_FILE, frozenset({"api:GET /api/orders"})),
# Domain-entity questions — answered by the cross-file-grounded `entity:...Order` artifact.
Question("e1", "What does the Order entity contain, including its line items?",
ENTITY_CROSS_FILE, frozenset({"entity:app.domain.order.Order"})),
Question("e2", "What fields does the Order domain model have and what type are its items?",
ENTITY_CROSS_FILE, frozenset({"entity:app.domain.order.Order"})),
Question("e3", "Which model does the Order entity's items field reference, and where is it?",
ENTITY_CROSS_FILE, frozenset({"entity:app.domain.order.Order"})),
]
26 changes: 26 additions & 0 deletions src/kb/eval/tier1_entities_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from kb.eval._fixtures import make_git_repo
from kb.extract.deterministic.entities import EntityExtractor
from kb.store import models as m
from kb.store.queries import provenance_for_artifact

# A src-layout module: a pydantic model, a dataclass, a SQLAlchemy model (plus a bare declarative
# Base that is NOT an entity), and a dynamically-built model (invisible to static parsing).
Expand Down Expand Up @@ -48,18 +49,29 @@
"\n\n"
'Dynamic = create_model("Dynamic", x=(int, ...))\n'
),
# A second module whose entity references one in shop/models.py (the cross-file link).
"src/shop/cart.py": (
"from dataclasses import dataclass\n"
"from shop.models import Order\n"
"\n\n"
"@dataclass\n"
"class Cart:\n"
" orders: list[Order]\n"
),
}

# Hand-labeled oracle: (framework, fq class). `Base` and `Dynamic` are deliberately absent.
EXPECTED_ENTITIES = {
("pydantic", "shop.models.Order"),
("dataclass", "shop.models.LineItem"),
("sqlalchemy", "shop.models.User"),
("dataclass", "shop.cart.Cart"),
}
EXPECTED_FIELDS = {
"shop.models.Order": {"id", "total", "note"},
"shop.models.LineItem": {"sku", "qty"},
"shop.models.User": {"id", "name", "legacy"}, # __tablename__ is metadata, not a field
"shop.cart.Cart": {"orders"},
}
KNOWN_GAP = "shop.models.Dynamic" # create_model(): dynamic, invisible to static analysis

Expand Down Expand Up @@ -132,3 +144,17 @@ def test_entities_grounded_on_class_spans(engine: Engine, tmp_path: Path) -> Non
for row in rows:
assert row.span_kind == "class"
assert row.payload["span_mapping"] == "exact"


def test_cross_file_entity_links_grounded(engine: Engine, tmp_path: Path) -> None:
"""`Cart` (cart.py) references `Order` (models.py) -> the artifact spans BOTH files."""
sha = _index(engine, tmp_path)
with engine.connect() as conn:
prov = provenance_for_artifact(conn, sha, "entity:shop.cart.Cart")
by_role = {(p.file_path, p.role) for p in prov}
assert ("src/shop/cart.py", "class_definition") in by_role
assert ("src/shop/models.py", "related_entity") in by_role # cross-file grounding

cart = next(p for p in _entity_payloads(engine, sha) if p["qualified_name"] == "shop.cart.Cart")
related = {(r["name"], r["target_fq"], r["via"]) for r in cart["related_entities"]}
assert ("Order", "shop.models.Order", "field_type") in related
13 changes: 10 additions & 3 deletions src/kb/eval/tier3_rag_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@
from kb.daemon.pipeline import index_commit
from kb.embed.population import embed_snapshot
from kb.eval._fixtures import make_git_repo
from kb.eval.questions import QUESTIONS
from kb.eval.questions import ENTITY_FILES, QUESTIONS
from kb.eval.tier1_api_test import FILES
from kb.extract.deterministic.entities import EntityExtractor
from kb.extract.deterministic.fastapi_contract import FastAPIExtractor
from kb.rag.baseline import index_rag_baseline, rag_retrieve
from kb.store import queries as q
Expand All @@ -28,8 +29,14 @@
@pytest.fixture(scope="module")
def prepared(engine: Engine, tmp_path_factory, st_provider) -> tuple[Engine, str]:
repo = tmp_path_factory.mktemp("tier3")
sha = make_git_repo(repo, [FILES])[0]
index_commit(engine, str(repo), sha, extractors=[FastAPIExtractor()], first_party_root="src")
sha = make_git_repo(repo, [{**FILES, **ENTITY_FILES}])[0]
index_commit(
engine,
str(repo),
sha,
extractors=[FastAPIExtractor(), EntityExtractor()],
first_party_root="src",
)
embed_snapshot(engine, sha, st_provider)
index_rag_baseline(engine, str(repo), sha, st_provider)
return engine, sha
Expand Down
Loading
Loading