Question answering system over official technical documentation with hybrid retrieval, reranking, extractive reading, explanatory answer synthesis, benchmark evaluation, and deployable web delivery.
- GitHub repository: mohammadrezakarami/Technical-Docs-QA
- Hugging Face Space: mokarami/Technical-Docs-QA
This project is a real QA pipeline built on top of official technical documentation, not a notebook demo and not a toy baseline. It covers the full path from corpus building and indexing to retrieval, answer generation, evaluation, and web deployment.
Current corpus coverage:
- FastAPI official docs
- Pandas official docs
- Python official docs
- Pydantic docs
- Scikit-learn docs
- official documentation crawling and cleaning
- sentence-aware chunking with overlap
- BM25 sparse retrieval
- FAISS dense retrieval with sentence-transformers
- cross-encoder reranking
- extractive QA reader
- grounded explanatory answer synthesis
- no-answer handling
- benchmark evaluation and error analysis
- FastAPI backend
- browser interface
- Docker deployment
- Hugging Face Space deployment
- Crawl official documentation pages
- Clean and normalize the extracted text
- Build sentence-aware chunks
- Create sparse and dense retrieval indexes
- Retrieve candidate chunks with BM25 and dense search
- Rerank candidates with a cross-encoder
- Read answer spans with an extractive QA model
- Aggregate support sentences for explanatory questions
- Return a grounded final answer through API or web interface
- Evaluate the system with answer and retrieval metrics
This repository now exposes the QA system through a web app and API, so the primary presentation path is no longer the terminal.
Available endpoints:
GET /: browser interfaceGET /api/health: health checkGET /api/status: artifact readiness summaryPOST /api/ask: inference endpoint
Example API request:
curl -X POST http://127.0.0.1:8000/api/ask \
-H "Content-Type: application/json" \
-d '{
"question": "How do you set custom response headers in FastAPI, and why does using a Response parameter work?",
"style": "explanatory",
"threshold": 0.01
}'Install dependencies:
python3 -m venv .venv
.venv/bin/pip install -r requirements.txtRun the web app locally:
.venv/bin/python scripts/run_real_qa_api.pyThen open:
Build and run locally with Docker:
docker build -t technical-docs-qa .
docker run -p 7860:7860 technical-docs-qaThe project is prepared for a Docker-based Hugging Face Space deployment.
Deployment-related files:
Dockerfile.dockerignorescripts/run_real_qa_api.pydocs/HF_SPACE_DEPLOYMENT.md
Recommended deployment strategy:
- keep
artifacts/real_qa/index/in the deployed repo - keep
artifacts/real_qa/processed/in the deployed repo - keep
artifacts/real_qa/models/out of the repo - let public model weights download at runtime when needed
Latest verified benchmark snapshot:
- QA Exact Match:
0.3214 - QA F1:
0.3959 - Explanatory F1:
0.2263 - Explanatory term coverage:
0.6857 - Retrieval Hit@1:
0.7857 - Retrieval Hit@3:
0.9286 - Retrieval MRR:
0.8452 - No-answer accuracy:
1.0000 - Example count:
30
Generated reports:
artifacts/real_qa/reports/evaluation_report.jsonartifacts/real_qa/reports/evaluation_report.mdartifacts/real_qa/reports/error_analysis.md
Delivery documents:
docs/TECHNICAL_REPORT.mddocs/PRESENTATION_SCRIPT.md
src/real_qa/: core QA pipelinesrc/real_qa/web.py: FastAPI appsrc/real_qa/ui/index.html: browser interfacescripts/run_real_qa_api.py: web server entrypointscripts/evaluate_real_qa.py: benchmark evaluationscripts/analyze_real_qa_errors.py: error analysisdata/eval/real_qa_eval.json: evaluation benchmarkartifacts/real_qa/index/: retrieval indexesartifacts/real_qa/processed/: processed corpusartifacts/real_qa/reports/: generated evaluation reports
CLI tooling still exists for development, reproducibility, and evaluation, but it is no longer the main presentation path.
Useful commands:
make build
make train
make serve
make eval
make analyze
make test- The benchmark now includes both extractive and explanatory questions.
- Retrieval quality is stronger than final answer quality, which means the main current bottleneck is reading and synthesis rather than document matching.
- The repository is designed to be delivery-oriented and reproducible, not a state-of-the-art research claim.