Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,15 @@

## [Unreleased]

### Added

- **Jina Embeddings v5 support**: two alternative embedding models available via `LlamaCppConfig.embedModel`:
- `jina-embeddings-v5-text-nano` (239M params, 768 dim, 8K tokens, MTEB-EN 71.0, MMTEB 65.5)
- `jina-embeddings-v5-text-small` (677M params, 1024 dim, 32K tokens, MTEB-EN 71.7, MMTEB 67.0)
- Model-aware prompt formatting: `formatQueryForEmbedding` and `formatDocForEmbedding` auto-detect Jina v5 models and apply the correct `Query:` / `Document:` prefix format
- `LlamaCpp.getEmbedModelUri()` public getter for the configured embedding model URI
- `isJinaV5Model()` utility function exported from `llm.ts`

## [1.1.0] - 2026-02-20

QMD now speaks in **query documents** — structured multi-line queries where every line is typed (`lex:`, `vec:`, `hyde:`), combining keyword precision with semantic recall. A single plain query still works exactly as before (it's treated as an implicit `expand:` and auto-expanded by the LLM). Lex now supports quoted phrases and negation (`"C++ performance" -sports -athlete`), making intent-aware disambiguation practical. The formal query grammar is documented in `docs/SYNTAX.md`.
Expand Down
27 changes: 25 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,10 +252,19 @@ QMD uses three local GGUF models (auto-downloaded on first use):

| Model | Purpose | Size |
|-------|---------|------|
| `embeddinggemma-300M-Q8_0` | Vector embeddings | ~300MB |
| `embeddinggemma-300M-Q8_0` | Vector embeddings (default) | ~300MB |
| `qwen3-reranker-0.6b-q8_0` | Re-ranking | ~640MB |
| `qmd-query-expansion-1.7B-q4_k_m` | Query expansion (fine-tuned) | ~1.1GB |

**Alternative embedding models** (configure via `LlamaCppConfig.embedModel`):

| Model | Params | Dimensions | Max Tokens | MTEB-EN | MMTEB | Size (Q8) |
|-------|--------|------------|------------|---------|-------|-----------|
| `jina-embeddings-v5-text-nano` | 239M | 768 | 8,192 | 71.0 | 65.5 | ~250MB |
| `jina-embeddings-v5-text-small` | 677M | 1,024 | 32,768 | 71.7 | 67.0 | ~710MB |

Jina v5 models use `Query: ` / `Document: ` task prefixes (auto-detected when configured).

Models are downloaded from HuggingFace and cached in `~/.cache/qmd/models/`.

## Installation
Expand Down Expand Up @@ -599,10 +608,15 @@ Models are configured in `src/llm.ts` as HuggingFace URIs:
const DEFAULT_EMBED_MODEL = "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf";
const DEFAULT_RERANK_MODEL = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf";
const DEFAULT_GENERATE_MODEL = "hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf";

// Alternative: Jina Embeddings v5 (multilingual, longer context, higher quality)
const JINA_V5_NANO = "hf:jinaai/jina-embeddings-v5-text-nano-retrieval-GGUF/v5-nano-retrieval-Q8_0.gguf";
const JINA_V5_SMALL = "hf:jinaai/jina-embeddings-v5-text-small-retrieval-GGUF/v5-small-retrieval-Q8_0.gguf";
```

### EmbeddingGemma Prompt Format
### Embedding Prompt Formats

**EmbeddingGemma** (default):
```
// For queries
"task: search result | query: {query}"
Expand All @@ -611,6 +625,15 @@ const DEFAULT_GENERATE_MODEL = "hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query
"title: {title} | text: {content}"
```

**Jina v5** (auto-detected when configured):
```
// For queries
"Query: {query}"

// For documents
"Document: {content}"
```

### Qwen3-Reranker

Uses node-llama-cpp's `createRankingContext()` and `rankAndSort()` API for cross-encoder reranking. Returns documents sorted by relevance score (0.0 - 1.0).
Expand Down
36 changes: 32 additions & 4 deletions src/llm.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,19 +23,34 @@ import { existsSync, mkdirSync, statSync, unlinkSync, readdirSync, readFileSync,
// Embedding Formatting Functions
// =============================================================================

/**
* Check whether a model URI refers to a Jina v5 embedding model.
*/
export function isJinaV5Model(modelUri: string): boolean {
return modelUri.includes("jina-embeddings-v5");
}

/**
* Format a query for embedding.
* Uses nomic-style task prefix format for embeddinggemma.
* EmbeddingGemma uses nomic-style: "task: search result | query: {query}"
* Jina v5 uses prefix format: "Query: {query}"
*/
export function formatQueryForEmbedding(query: string): string {
export function formatQueryForEmbedding(query: string, modelUri?: string): string {
if (modelUri && isJinaV5Model(modelUri)) {
return `Query: ${query}`;
}
return `task: search result | query: ${query}`;
}

/**
* Format a document for embedding.
* Uses nomic-style format with title and text fields.
* EmbeddingGemma uses nomic-style: "title: {title} | text: {content}"
* Jina v5 uses prefix format: "Document: {content}"
*/
export function formatDocForEmbedding(text: string, title?: string): string {
export function formatDocForEmbedding(text: string, title?: string, modelUri?: string): string {
if (modelUri && isJinaV5Model(modelUri)) {
return title ? `Document: ${title} ${text}` : `Document: ${text}`;
}
return `title: ${title || "none"} | text: ${text}`;
}

Expand Down Expand Up @@ -179,6 +194,12 @@ const DEFAULT_RERANK_MODEL = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-re
// const DEFAULT_GENERATE_MODEL = "hf:ggml-org/Qwen3-0.6B-GGUF/Qwen3-0.6B-Q8_0.gguf";
const DEFAULT_GENERATE_MODEL = "hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf";

// Alternative embedding models:
// Jina Embeddings v5 - task-targeted distillation, multilingual, longer context
// Uses "Query: " / "Document: " prefix format (not nomic-style task/title format)
export const JINA_V5_NANO_EMBED_MODEL = "hf:jinaai/jina-embeddings-v5-text-nano-retrieval-GGUF/v5-nano-retrieval-Q8_0.gguf";
export const JINA_V5_SMALL_EMBED_MODEL = "hf:jinaai/jina-embeddings-v5-text-small-retrieval-GGUF/v5-small-retrieval-Q8_0.gguf";

// Alternative generation models for query expansion:
// LiquidAI LFM2 - hybrid architecture optimized for edge/on-device inference
// Use these as base for fine-tuning with configs/sft_lfm2.yaml
Expand Down Expand Up @@ -394,6 +415,13 @@ export class LlamaCpp implements LLM {
this.disposeModelsOnInactivity = config.disposeModelsOnInactivity ?? false;
}

/**
* Get the configured embedding model URI.
*/
getEmbedModelUri(): string {
return this.embedModelUri;
}

/**
* Reset the inactivity timer. Called after each model operation.
* When timer fires, models are unloaded to free memory (if no active sessions).
Expand Down
7 changes: 4 additions & 3 deletions src/qmd.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1609,7 +1609,8 @@ async function vectorIndex(model: string = DEFAULT_EMBED_MODEL, force: boolean =
if (!firstChunk) {
throw new Error("No chunks available to embed");
}
const firstText = formatDocForEmbedding(firstChunk.text, firstChunk.title);
const embedModelUri = getDefaultLlamaCpp().getEmbedModelUri();
const firstText = formatDocForEmbedding(firstChunk.text, firstChunk.title, embedModelUri);
const firstResult = await session.embed(firstText);
if (!firstResult) {
throw new Error("Failed to get embedding dimensions from first chunk");
Expand All @@ -1628,7 +1629,7 @@ async function vectorIndex(model: string = DEFAULT_EMBED_MODEL, force: boolean =
const batch = allChunks.slice(batchStart, batchEnd);

// Format texts for embedding
const texts = batch.map(chunk => formatDocForEmbedding(chunk.text, chunk.title));
const texts = batch.map(chunk => formatDocForEmbedding(chunk.text, chunk.title, embedModelUri));

try {
// Batch embed all texts at once
Expand All @@ -1652,7 +1653,7 @@ async function vectorIndex(model: string = DEFAULT_EMBED_MODEL, force: boolean =
// If batch fails, try individual embeddings as fallback
for (const chunk of batch) {
try {
const text = formatDocForEmbedding(chunk.text, chunk.title);
const text = formatDocForEmbedding(chunk.text, chunk.title, embedModelUri);
const result = await session.embed(text);
if (result) {
insertEmbedding(db, chunk.hash, chunk.seq, chunk.pos, new Float32Array(result.embedding), model, now);
Expand Down
12 changes: 8 additions & 4 deletions src/store.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import {
getDefaultLlamaCpp,
formatQueryForEmbedding,
formatDocForEmbedding,
isJinaV5Model,
type RerankDocument,
type ILLMSession,
} from "./llm.js";
Expand Down Expand Up @@ -1352,7 +1353,7 @@ export function getActiveDocumentPaths(db: Database, collectionName: string): st
return rows.map(r => r.path);
}

export { formatQueryForEmbedding, formatDocForEmbedding };
export { formatQueryForEmbedding, formatDocForEmbedding, isJinaV5Model };

export function chunkDocument(
content: string,
Expand Down Expand Up @@ -2242,7 +2243,8 @@ export async function searchVec(db: Database, query: string, model: string, limi

async function getEmbedding(text: string, model: string, isQuery: boolean, session?: ILLMSession): Promise<number[] | null> {
// Format text using the appropriate prompt template
const formattedText = isQuery ? formatQueryForEmbedding(text) : formatDocForEmbedding(text);
const embedModelUri = getDefaultLlamaCpp().getEmbedModelUri();
const formattedText = isQuery ? formatQueryForEmbedding(text, embedModelUri) : formatDocForEmbedding(text, undefined, embedModelUri);
const result = session
? await session.embed(formattedText, { model, isQuery })
: await getDefaultLlamaCpp().embed(formattedText, { model, isQuery });
Expand Down Expand Up @@ -2985,7 +2987,8 @@ export async function hybridQuery(

// Batch embed all vector queries in a single call
const llm = getDefaultLlamaCpp();
const textsToEmbed = vecQueries.map(q => formatQueryForEmbedding(q.text));
const embedModelUri = llm.getEmbedModelUri();
const textsToEmbed = vecQueries.map(q => formatQueryForEmbedding(q.text, embedModelUri));
hooks?.onEmbedStart?.(textsToEmbed.length);
const embedStart = Date.now();
const embeddings = await llm.embedBatch(textsToEmbed);
Expand Down Expand Up @@ -3274,7 +3277,8 @@ export async function structuredSearch(
const vecSearches = searches.filter(s => s.type === 'vec' || s.type === 'hyde');
if (vecSearches.length > 0) {
const llm = getDefaultLlamaCpp();
const textsToEmbed = vecSearches.map(s => formatQueryForEmbedding(s.query));
const embedModelUri = llm.getEmbedModelUri();
const textsToEmbed = vecSearches.map(s => formatQueryForEmbedding(s.query, embedModelUri));
hooks?.onEmbedStart?.(textsToEmbed.length);
const embedStart = Date.now();
const embeddings = await llm.embedBatch(textsToEmbed);
Expand Down