MCP: vector chunk matches with unified match interface#11
Closed
endolith wants to merge 1 commit into
Closed
Conversation
Add ChunkScoringBackend.ScoreMessageChunks to score every embedded chunk within a message against the query vector (best-first, optional min_score). - search_messages mode=vector|hybrid: each hit includes matches[] with chunk snippets and similarity scores (up to 5 per message) - search_in_message mode=vector: paginated chunk matches for one message - search_message_bodies: context_snippets renamed to matches[] with char_offset and line for consistency with keyword search_in_message Hybrid engine returns QueryVector in ResultMeta to avoid re-embedding when enriching search results. Snippets are sliced from preprocessed embed text; char_offset maps into raw body_text when the chunk is in the body region. Co-authored-by: endolith <endolith@gmail.com>
deae01a to
b3970d9
Compare
1697fca to
b9308a8
Compare
Owner
Author
|
Close in favor of #15 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds within-message vector chunk scoring and harmonizes match excerpts across MCP search tools.
Changes
ChunkScoringBackend.ScoreMessageChunks— scores every embedded chunk of a message against the query vector (sqlitevec + pgvector)search_messages(mode=vector|hybrid) — each hit includesmatches[](up to 5 chunks, best similarity first, optionalmin_score)search_in_message(mode=vector) — paginated semantic chunk matches for one message (samematchesshape as keyword mode, withscoreon each)search_message_bodies—context_snippetsreplaced bymatches[]withchar_offset,snippet, andline(keyword matches, no score)Usage
Each vector match includes
char_offset(byte offset intobody_textwhen the chunk is in the body),snippet,line, andscore(0–1 similarity). Useget_messagewithcenter_at=<char_offset>to read more context.Snippets reflect preprocessed embed text (subject prefix + cleaned body), matching what was embedded.
Stack
Stacks on PR #9 (
split/pr4-search-message-bodies).