Skip to content

feat: cross-file entity links + Tier-3 entity questions#10

Merged
v0ropaev merged 1 commit into
masterfrom
feat/entity-links
Jun 20, 2026
Merged

feat: cross-file entity links + Tier-3 entity questions#10
v0ropaev merged 1 commit into
masterfrom
feat/entity-links

Conversation

@v0ropaev

Copy link
Copy Markdown
Owner

Extends the entities extractor so a single entity artifact spans multiple files — the structural reason knowbase beats RAG — mirroring how the API extractor grounds a route on its handler + response_model across files. Also brings the Tier-3 A/B to a second knowledge type.

What

  • entities.py (two-pass). Pass 1 classifies every class and indexes entities by short name; pass 2 resolves each entity's field-type references (list[LineItem], User | None, Mapped[list["Order"]]) and SQLAlchemy relationship() targets against that index, adding related_entity grounding edges (cross-file when the target lives elsewhere) + a related_entities payload. extractor_version 1 → 2 (derived-from set changes → ids rotate, gated per DESIGN §6; one-hop invalidation now links referenced→referencing entity). ForeignKey(...), transitive/aliased imports, association tables are documented gaps.
  • embed/text.py. Enriched entity embed text (qualified name + field + related-entity names) so entity questions rank in search_knowledge.
  • Tier-1 gate. Added a cross-file Cart → Order pair; test_cross_file_entity_links_grounded asserts the artifact is grounded on both files (role related_entity) and the reference resolves to shop.models.Order.
  • Tier-3 A/B. A two-file Order/LineItem entity fixture + 3 entity questions; the harness indexes [FastAPIExtractor, EntityExtractor] and the generic recall loop asserts knowbase cross-file recall@5 == 1.0 for entity questions too (now 11 questions). RAG arm stays tracked/non-asserted.

Verification

  • 52 eval tests pass; ruff + mypy --strict clean.
  • Tier-3: knowbase cross-file recall@5 == 1.000 for all 11 questions; recall@1 separator knowbase 0.682 vs RAG 0.409.
  • End-to-end on knowbase itself: 9/25 entities resolve links, incl. a genuine cross-file one — ExtractContext (base.py) → ParsedSpan (structural/interface.py); same-file links correctly add no extra file.

Store/queries unchanged (kind-opaque). Out of scope / documented follow-ups: ForeignKey("table.col") resolution, transitive imports, a separate entity_relation graph kind.

Extend the entities extractor so one `entity` artifact spans multiple files —
the structural reason knowbase beats RAG — mirroring how the API extractor
grounds a route on its handler + response_model across files.

- entities.py: two-pass extract. Pass 1 classifies every class and indexes
  entities by short name; pass 2 resolves each entity's field-type references
  and SQLAlchemy relationship() targets against that index, adding
  `related_entity` grounding edges (cross-file when the target lives elsewhere)
  + a `related_entities` payload list. extractor_version 1 -> 2 (derived_from
  set changes, so ids rotate; gated per DESIGN §6). FK / transitive imports are
  documented gaps.
- embed/text.py: enrich entity embed text (qualified name, field + related
  names) so entity questions rank in search_knowledge.
- tier1_entities_test.py: add a cross-file Cart -> Order pair and assert the
  artifact is grounded on both files (role related_entity).
- questions.py + tier3_rag_test.py: a two-file Order/LineItem entity fixture and
  3 entity questions; Tier-3 indexes both extractors and asserts knowbase
  cross-file recall@5 == 1.0 for entity questions too (now 11 questions).

52 eval tests pass; ruff + mypy --strict clean. End-to-end on knowbase itself:
9/25 entities resolve links incl. a true cross-file one (ExtractContext ->
ParsedSpan).
@v0ropaev v0ropaev merged commit b93997d into master Jun 20, 2026
1 check passed
@v0ropaev v0ropaev deleted the feat/entity-links branch June 20, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant