Web-grounded multi-hop research engine — live search, hybrid reranking, grounded LLM answer
80% pass rate on a complex multi-hop benchmark · Built from first principles
Here's the full pipeline trace for a multi-hop query — from decomposition to grounded answer.

KnowledgeOps is a web-grounded research engine built from first principles.
It searches the live web, retrieves and ranks the most answerable content, and generates grounded answers using a local LLM — without hallucinating.
- Building a multi-hop RAG pipeline from scratch
- Answerability-driven retrieval — not just semantic similarity
- Hybrid reranking combining cosine similarity with question-type-aware heuristics
- Query decomposition using the LLM itself as a planner
- Grounding enforcement to prevent hallucination
- Systematic pipeline debugging with full stage-by-stage logging
- Multi-hop query decomposition — complex queries broken into independent sub-questions, each retrieved separately
- Hybrid reranking — cosine similarity + spaCy answerability scoring weighted by question type (who/when/where vs what/how/why)
- Grounding enforcement — LLM answers only from retrieved context, returns "I don't know" otherwise
- Fault-tolerant extraction — handles SSL errors, login walls, timeouts, and 403s gracefully
- Adaptive prompting — extraction mode for entity questions, explanation mode for descriptive questions
- Full pipeline logging — every stage logged for systematic debugging
| Component | Technology |
|---|---|
| API | FastAPI, Uvicorn |
| Search | DuckDuckGo (ddgs) |
| Extraction | requests, BeautifulSoup |
| Embeddings | SentenceTransformers (all-MiniLM-L6-v2) |
| Vector Store | ChromaDB (in-memory) |
| NLP | spaCy (en_core_web_sm) |
| LLM | Mistral via Ollama (local) |
| Language | Python 3.11 |
User Query
│
▼
QueryPlanner — LLM-based decomposition into sub-questions
│
▼
QueryProcessor — normalize, rewrite, expand per sub-question
│
▼
SearchEngine — DuckDuckGo, top 5 URLs after deduplication
│
▼
PageExtractor — article/main targeting, login-wall detection
│
▼
TextChunker — 1000 char chunks, min line length 50
│
▼
EmbeddingService — MiniLM, 384-dim vectors
│
▼
VectorStore — ChromaDB in-memory
│
▼
VectorRetriever — Top-10 recall
│
▼
Reranker — hybrid cosine + spaCy, question-type weights
│
▼
PromptBuilder — extraction/explanation mode, grounding enforcement
│
▼
OllamaProvider — Mistral local
│
▼
Answer
User Query
↓
Query Decomposition (QueryPlanner)
↓
Search + Extraction per Sub-question
↓
Chunk + Embed
↓
Vector Recall (Top-10)
↓
Hybrid Reranking (Top-3)
↓
Grounded Prompt → LLM
↓
Answer
Query What is FastAPI and who created it?
Sub-questions (QueryPlanner) ["What is FastAPI?", "Who created FastAPI?"]
Retrieved Context FastAPI is a modern, high-performance web framework for building APIs with Python, created by Sebastián Ramírez in December 2018.
Response FastAPI is a modern web framework for building APIs with Python. It was created by Sebastián Ramírez.
Benchmark of 5 queries covering all question types:
| Query | Type | Result |
|---|---|---|
| Who owns Virgin Group + born when? | who + when | PASS ✅ |
| What is the 2nd Amendment? | what | PASS ✅ |
| When did first person reach Mars? | grounding test | PASS ✅ |
| How to fractionally distillate crude oil? | how | PASS ✅ |
| Why did Bose resign from Congress? | why | PARTIAL |
Score: 4/5
Every stage of the pipeline is logged:
- Sub-questions from QueryPlanner
- URLs collected and processed
- Chunk count per run
- Retrieved chunk content
- Full prompt sent to LLM
- LLM response
This makes systematic debugging possible — any wrong answer can be traced to its exact failure stage without guessing.
1. Install dependencies
pip install -r requirements.txt
python -m spacy download en_core_web_sm2. Install Ollama and pull Mistral
ollama pull mistral3. Start the API server
uvicorn main:app --reload4. Run the interactive test
python -m scripts.test_research_service5. Run the evaluation benchmark
python -m scripts.evaluateknowledge_ops/
├── app/
│ ├── api/ — FastAPI server and routes
│ ├── extraction/ — HTML to text extraction
│ ├── llm/ — Ollama provider and LLM client
│ ├── processing/ — chunker, query processor, query planner
│ ├── prompts/ — prompt builder
│ ├── retrieval/ — embeddings, vector store, retriever, reranker
│ ├── search/ — search engine
│ ├── services/ — research service orchestrator
│ └── utils/ — logger
├── scripts/
│ ├── test_research_service.py
│ └── evaluate.py
├── main.py
├── requirements.txt
└── .gitignore
- Source reliability — results vary by run depending on which URLs DuckDuckGo returns and which succeed
- Heuristic reranker — rule-based spaCy scoring, a learned cross-encoder would be significantly stronger
- In-memory vector store — ChromaDB resets on every run, no persistence across sessions
- LLM latency — Ollama/Mistral averages 10-40s per query, not production-ready
- Grounding enforcement — prompt-based only; programmatic cosine similarity verification removed due to semantic mismatch between synthesized answers and raw chunks
- QueryPlanner non-determinism — LLM output format varies between runs; dict normalization applied as defensive fix
KnowledgeOps demonstrates a production-style web-grounded RAG engine combining live web search, multi-hop query decomposition, hybrid answerability-driven reranking, grounding enforcement, and local LLM inference — built and understood from first principles.
