An enterprise-grade internal knowledge and incident handling platform built on top of RAG.
Instead of a generic "chat with docs" demo, this project focuses on a real operational workflow:
Intake -> Triage -> Investigation -> Escalation -> Resolution -> Knowledge Capture
- GitHub (main): Koala-la-la/AI-RAG-System
- GitCode (mirror): 2501_94493648/KB-AI
In real teams, large volumes of API docs, runbooks, SOPs, and FAQs create friction:
- Slow onboarding for new teammates
- Inconsistent troubleshooting responses
- Repeated "same issue, same manual search" cycles
- Weak incident traceability and poor knowledge retention
This project turns RAG into a workflow-centric product, not just a chatbot.
- Technical support / customer support teams
- SRE / operations teams
- On-call engineers
- Product operations teams
Provide a single internal workspace to:
- Register and process cases
- Retrieve grounded answers from internal knowledge
- Enforce case lifecycle transitions
- Track SLA and audit trails
- Evaluate quality and iterate continuously
Cases are first-class objects with strict status transitions:
newtriagedinvestigatingpending_escalationescalatedresolvedarchived
Illegal transitions are rejected by backend validation.
SLA deadlines are auto-generated by priority:
| Priority | First Response SLA | Resolution SLA |
|---|---|---|
low |
240 minutes | 72 hours |
medium |
120 minutes | 24 hours |
high |
30 minutes | 8 hours |
critical |
10 minutes | 2 hours |
The system exposes breach flags for both response and resolution.
Every important action is logged per case:
- case created
- case updated
- status transitioned
- messages appended
- case deleted
Each event includes actor, timestamp, event type, and details.
- Roles:
member,admin - Document visibility:
private,team,org - Retrieval and evaluation are both permission-aware
- Query rewrite (context-aware)
- Case-aware routing
- Hybrid retrieval (semantic + lexical)
- Reranking
- Grounded-answer constraints and citation exposure
- Generated multi-document benchmark suite
- JSON / Markdown report outputs
- Pass rate and quality trend tracking
flowchart LR
UI[React UI] --> API[FastAPI API]
API --> AUTH[Auth and RBAC]
API --> WF[Case Workflow Engine]
API --> RAG[RAG Pipeline]
API --> EVAL[Evaluation Engine]
WF --> STORE[Conversation and Case Store]
WF --> AUDIT[Audit Timeline Store]
RAG --> REWRITE[Query Rewrite]
RAG --> ROUTER[Case Router]
RAG --> RETRIEVE[Hybrid Retrieval]
RETRIEVE --> MILVUS[Milvus]
RAG --> RERANK[Rerank and Grounding]
RAG --> LLM[LLM Response]
API --> REG[Document Registry]
REG --> INGEST[Upload and Chunking]
INGEST --> MILVUS
API --> REDIS[Redis Cache]
- Backend: FastAPI, Pydantic, LangGraph, LangChain
- Frontend: React, Vite, Axios
- Vector DB: Milvus
- Cache: Redis
- Model Serving: Ollama (default), optional HuggingFace embeddings
- Document Processing: PyPDF
- Evaluation: custom benchmark runner
app/
api/
auth_api.py
chat_api.py
conversation_api.py
document_api.py
evaluation_api.py
upload_api.py
auth/
cache/
conversation/
store.py
evaluation/
benchmark.py
knowledge/
access.py
document_manager.py
document_registry.py
embedder.py
milvus_store.py
splitter.py
llm/
ollama_client.py
rag/
graph.py
prompt.py
ranking.py
retriever.py
router.py
web/react-ui/src/
pages/
Dashboard.jsx
Chat.jsx
Documents.jsx
Evaluation.jsx
Login.jsx
data/
auth_users.json
auth_sessions.json
document_registry.json
conversations/
uploads/
eval/
- Python 3.10+
- Node.js 18+
- Docker Desktop
pip install -r requirements.txtcd web/react-ui
npm installdocker compose up -d redis etcd minio milvusUse .env.example as baseline. Key vars:
MILVUS_HOST,MILVUS_PORTREDIS_HOST,REDIS_PORTOLLAMA_BASE_URL,OLLAMA_MODELEMBED_PROVIDER(ollamaorhuggingface)
python -m uvicorn app.main:app --reloadcd web/react-ui
npm run devOpen: http://localhost:5173
POST /api/auth/registerPOST /api/auth/loginPOST /api/auth/logout
POST /api/uploadGET /api/documentsDELETE /api/documentsPOST /api/documents/reindex
GET /api/conversationsPOST /api/conversationsPATCH /api/conversations/{conversation_id}POST /api/conversations/{conversation_id}/transitionGET /api/conversations/{conversation_id}/messagesGET /api/conversations/{conversation_id}/timelineDELETE /api/conversations/{conversation_id}
POST /api/chat
GET /api/evaluation/latest-reportPOST /api/evaluation/run-generated-suite
List indexed sources:
python -m app.evaluation.benchmark --list-sources --user-id user001 --kb-id defaultkbGenerate suite template:
python -m app.evaluation.benchmark --generate-suite-template --user-id user001 --kb-id defaultkb --benchmarks-dir data/eval/benchmarks/generated --suite-output data/eval/generated_suites/user001_defaultkb.jsonRun suite:
python -m app.evaluation.benchmark --suite data/eval/generated_suites/user001_defaultkb.jsonIf you see:
Fail connecting to server on 127.0.0.1:19530
Run:
docker compose ps
docker compose up -d redis etcd minio milvus- Retry downloads
- Prefer
EMBED_PROVIDER=ollamafor stable local flow - Pre-cache model assets if needed
Usually this is terminal encoding, not data corruption. Verify with UTF-8 readers.
- Assignee and escalation target management (R&D / Ops)
- SLA alerts and notification center
- Auto-promote failed cases into FAQ/SOP candidates
- Org-level knowledge operations dashboard
Issues and PRs are welcome.
If this project helps you, consider giving it a star.