feat: deterministic entities extractor (pydantic / dataclass / SQLAlchemy)#9
Merged
Conversation
…hemy) The next deterministic family after imports + the API contract (DESIGN §14 #1). A fully static EntityExtractor emits one `entity` artifact per domain class — pydantic BaseModel, @DataClass, and SQLAlchemy declarative model — with its fields, grounded on the class-definition span. No code execution (tree-sitter re-parse, mirroring fastapi_contract); detection signals + limits recorded in the payload; framework_versions (pydantic/sqlalchemy) folded into the key. - src/kb/extract/deterministic/entities.py — detection (dataclass decorator; pydantic BaseModel/BaseSettings base; SQLAlchemy __tablename__ / Mapped[] / mapped_column/Column), field parsing, artifact assembly. - src/kb/eval/tier1_entities_test.py — HARD gate (#8): hand-labeled oracle for entities + fields, a bare declarative Base is NOT an entity, create_model() is a known gap, every entity grounded on a `class` span. - register in the index pipeline (cli.py); add an `entity` branch to MCP records.summarize. - docs: README (eight gates, architecture row, status bullet), DESIGN §11/§14, CHANGELOG [Unreleased]. 51 eval tests pass (was 46); ruff + mypy --strict clean. End-to-end on knowbase itself yields 23 entities (17 dataclass, 6 pydantic; 0 false positives — it uses SQLAlchemy Core, not declarative).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The next deterministic family after imports + the API contract (DESIGN.md §14 #1, dropped from the README roadmap in the v0.2.0 refresh). A fully static
EntityExtractoremits oneentityartifact per domain class — pydanticBaseModel,@dataclass, SQLAlchemy declarative model — with its fields, grounded on the class-definition span. Cheapest, highest-trust way to broaden the knowledge surface; immediately served via MCPget_knowledge/search_knowledge.Approach
raw_text, mirroringfastapi_contract. No new dependency.BaseModel/BaseSettingsbase; SQLAlchemy__tablename__/Mapped[...]/mapped_column(...)/Column(...). A bare declarativeBaseis correctly not an entity. Transitive bases & imperative SQLAlchemy mapping are documented gaps, not silent losses.framework_versions(pydantic/sqlalchemy) folded into the artifact key per DESIGN §6.Gate — HARD #8
tier1_entities_test.py(hand-labeled oracle, the imports-gate pattern): extracted entities + fields match the oracle;Baseis not an entity; acreate_model(...)model is a known gap; every entity grounded on aclassspan.Verification
mypy --strictclean.summarize().Touch-points
New
entities.py+tier1_entities_test.py; register incli.py; onesummarizebranch inmcp/records.py; docs (README eight-gates/architecture/status, DESIGN §11/§14, CHANGELOG[Unreleased]). Store/queries unchanged (kind-opaque).Out of scope (documented follow-ups): cross-file entity links (relationship/FK target grounding, like the API extractor's
response_model), Enum/TypedDict/attrs, Tier-3 entity questions.