Skip to content

MCP: vector chunk matches with unified match interface#11

Closed
endolith wants to merge 1 commit into
split/pr4-search-message-bodiesfrom
cursor/vector-chunk-matches-9269
Closed

MCP: vector chunk matches with unified match interface#11
endolith wants to merge 1 commit into
split/pr4-search-message-bodiesfrom
cursor/vector-chunk-matches-9269

Conversation

@endolith

Copy link
Copy Markdown
Owner

Summary

Adds within-message vector chunk scoring and harmonizes match excerpts across MCP search tools.

Changes

  • ChunkScoringBackend.ScoreMessageChunks — scores every embedded chunk of a message against the query vector (sqlitevec + pgvector)
  • search_messages (mode=vector|hybrid) — each hit includes matches[] (up to 5 chunks, best similarity first, optional min_score)
  • search_in_message (mode=vector) — paginated semantic chunk matches for one message (same matches shape as keyword mode, with score on each)
  • search_message_bodiescontext_snippets replaced by matches[] with char_offset, snippet, and line (keyword matches, no score)

Usage

// Hybrid search with chunk excerpts
search_messages { "query": "quarterly budget", "mode": "hybrid", "min_score": 0.3 }

// Within-message semantic chunks
search_in_message { "id": 123, "query": "project deadline", "mode": "vector", "min_score": 0.2 }

Each vector match includes char_offset (byte offset into body_text when the chunk is in the body), snippet, line, and score (0–1 similarity). Use get_message with center_at=<char_offset> to read more context.

Snippets reflect preprocessed embed text (subject prefix + cleaned body), matching what was embedded.

Stack

Stacks on PR #9 (split/pr4-search-message-bodies).

Open in Web Open in Cursor 

Add ChunkScoringBackend.ScoreMessageChunks to score every embedded chunk
within a message against the query vector (best-first, optional min_score).

- search_messages mode=vector|hybrid: each hit includes matches[] with
  chunk snippets and similarity scores (up to 5 per message)
- search_in_message mode=vector: paginated chunk matches for one message
- search_message_bodies: context_snippets renamed to matches[] with
  char_offset and line for consistency with keyword search_in_message

Hybrid engine returns QueryVector in ResultMeta to avoid re-embedding
when enriching search results. Snippets are sliced from preprocessed
embed text; char_offset maps into raw body_text when the chunk is in
the body region.

Co-authored-by: endolith <endolith@gmail.com>
@endolith endolith force-pushed the cursor/vector-chunk-matches-9269 branch from deae01a to b3970d9 Compare June 21, 2026 01:45
@endolith endolith force-pushed the split/pr4-search-message-bodies branch from 1697fca to b9308a8 Compare June 21, 2026 01:45
@endolith endolith marked this pull request as ready for review June 21, 2026 03:13
@endolith endolith marked this pull request as draft June 21, 2026 03:14
@endolith

Copy link
Copy Markdown
Owner Author

Close in favor of #15

@endolith endolith closed this Jun 25, 2026
@endolith endolith deleted the cursor/vector-chunk-matches-9269 branch June 25, 2026 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants