Query a Docs2DB RAG database with modern retrieval techniques. Docs2DB-API provides a Python library for hybrid search (vector + BM25) with reranking.
What it does:
- Queries RAG databases created by docs2db
- Hybrid search: combines vector similarity with BM25 full-text search
- Reciprocal Rank Fusion (RRF) for result combination
- Cross-encoder reranking for improved result quality
- Question refinement for query expansion
- Universal RAG engine adaptable to multiple API frameworks
What it's for:
- Building RAG applications and agents
- Adding document search to LLM systems
- Serving RAG APIs (FastAPI, LlamaStack, custom frameworks)
uv add docs2db-apiStep 1: Create a database with docs2db
uv tool install docs2db
docs2db pipeline /path/to/documentsThis creates ragdb_dump.sql.
Step 2: Restore and query
# Start database
uv run docs2db-api db-start
# Restore dump
uv run docs2db-api db-restore ragdb_dump.sql
# Check status
uv run docs2db-api db-statusStep 3: Use in your application
import asyncio
from docs2db_api.rag.engine import UniversalRAGEngine, RAGConfig
async def main():
# Initialize engine with defaults (auto-detects database from environment)
engine = UniversalRAGEngine()
await engine.start()
# # Or with specific settings
# config = RAGConfig(
# model_name="granite-30m-english",
# max_chunks=5,
# similarity_threshold=0.7
# )
# db_config = {
# "host": "localhost",
# "port": "5432",
# "database": "ragdb",
# "user": "postgres",
# "password": "postgres"
# }
# engine = UniversalRAGEngine(config=config, db_config=db_config)
# await engine.start()
# Search
result = await engine.search_documents("How do I configure authentication?")
for doc in result.documents:
print(f"Score: {doc['similarity_score']:.3f}")
print(f"Source: {doc['document_path']}")
print(f"Text: {doc['text'][:200]}...\n")
asyncio.run(main())Docs2DB-API includes a native LlamaStack tool provider for agent-based RAG. See the complete demo with setup scripts and examples:
📁 demos/llama-stack/ - LlamaStack RAG tool provider with agent demos
(this needs to be adjusted to work from pypi)
Configuration precedence (highest to lowest):
- CLI arguments:
--host,--port,--db,--user,--password - Environment variables:
DOCS2DB_DB_HOST,DOCS2DB_DB_PORT,DOCS2DB_DB_DATABASE,DOCS2DB_DB_USER,DOCS2DB_DB_PASSWORD DOCS2DB_DB_URL:postgresql://user:pass@host:port/databasepostgres-compose.ymlin current directory- Defaults:
localhost:5432, user=postgres, password=postgres, db=ragdb
Examples:
# Use defaults
uv run docs2db-api db-status
# Environment variables
export DOCS2DB_DB_HOST=prod.example.com
export DOCS2DB_DB_DATABASE=mydb
uv run docs2db-api db-status
# DOCS2DB_DB_URL (cloud providers)
export DOCS2DB_DB_URL="postgresql://user:pass@host:5432/db"
uv run docs2db-api db-status
# CLI arguments
uv run docs2db-api db-status --host localhost --db mydbNote: Don't mix DOCS2DB_DB_URL with individual DOCS2DB_DB_* variables.
Configure the LLM used for query refinement:
export DOCS2DB_LLM_BASE_URL=http://localhost:11434 # OpenAI-compatible API (e.g., Ollama)
export DOCS2DB_LLM_MODEL=qwen2.5:7b-instruct # Model name
export DOCS2DB_LLM_TIMEOUT=30.0 # HTTP timeout (seconds)
export DOCS2DB_LLM_TEMPERATURE=0.7 # Generation temperature
export DOCS2DB_LLM_MAX_TOKENS=500 # Max tokens per responseRAG settings control retrieval behavior (similarity thresholds, reranking, refinement, etc.) and can be stored in the database or provided at query time.
refinement_prompt- Custom prompt for query refinementenable_refinement(refinement) - Enable question refinement (true/false)enable_reranking(reranking) - Enable cross-encoder reranking (true/false)similarity_threshold- Similarity threshold 0.0-1.0max_chunks- Maximum chunks to returnmax_tokens_in_context- Maximum tokens in context windowrefinement_questions_count- Number of refined questions to generate
- Query parameters - Passed directly to
engine.search_documents()or CLI--threshold,--limit, etc. - RAGConfig object - Provided when initializing
UniversalRAGEngine - Database settings - Stored in database via
docs2db configcommand (see docs2db) - Code defaults - Built-in fallback values
docs2db-api db-start # Start PostgreSQL with Podman/Docker
docs2db-api db-stop # Stop PostgreSQL (data preserved)
docs2db-api db-destroy # Stop and delete all data
docs2db-api db-status # Check connection and stats
docs2db-api db-restore <file> # Restore database from dump
docs2db-api manifest # Generate list of documents# Basic search
docs2db-api query "How do I configure authentication?"
# Advanced options
docs2db-api query "deployment guide" \
--model granite-30m-english \
--limit 20 \
--threshold 0.8 \
--no-refine # Disable question refinementDocs2DB-API implements modern retrieval techniques:
- Contextual chunks - LLM-generated context situating each chunk within its document (Anthropic's approach)
- Hybrid search - Combines BM25 (lexical) and vector embeddings (semantic)
- Reciprocal Rank Fusion (RRF) - Intelligent result combination
- Cross-encoder reranking - Improved result quality
- Question refinement - Query expansion for better matches
- PostgreSQL full-text search - tsvector with GIN indexing for BM25
- pgvector similarity - Fast vector search with HNSW indexes
- Universal RAG engine - Adaptable to multiple API frameworks
See LICENSE for details.