Simply drop your custom PDF documents into a folder, and the system automatically builds a powerful AI backend that answers questions strictly based on your files. Highly optimized for the Korean language and fully containerized for production using an Nginx + Gunicorn multi-worker architecture.
- 📂 Plug-and-Play Architecture: - No complex DB setup required. Just place any PDF files into the
/datadirectory, and the server handles the parsing, splitting, and embedding automatically on boot. - 🇰🇷 Korean-Optimized Hybrid Search: - Overcomes the limitations of English-centric embedding models by combining Dense Retrieval (FAISS) and Sparse Retrieval (BM25 with KoNLPy morphological analysis) via an
EnsembleRetrieverat a 5:5 ratio, ensuring maximum accuracy for Korean queries. - 🛡️ Hallucination Mitigation & Strict Citation:
- Applied advanced prompt engineering to force the LLM to explicitly cite the exact source file and page number for every piece of information used (e.g.,
[Source: manual.pdf, Page 12]).
- Applied advanced prompt engineering to force the LLM to explicitly cite the exact source file and page number for every piece of information used (e.g.,
- ⚡ Cost & Speed Optimization via Local Caching:
- Prevented OpenAI API Rate Limit (429) errors and significantly reduced initialization costs by caching vector stores and preprocessed sparse matrix objects (
pickle) to local disk. Server reboot time is reduced to under 1 second.
- Prevented OpenAI API Rate Limit (429) errors and significantly reduced initialization costs by caching vector stores and preprocessed sparse matrix objects (
- 🏭 Production-Ready Infrastructure:
- Scalable Gunicorn WSGI multi-worker setup to overcome Flask's built-in concurrency limits.
- Nginx (Alpine) deployed as a reverse proxy for secure traffic routing within a private Docker network.
- Docker and Docker Compose installed on your local machine or server.
- An active OpenAI API Key from here.
# Clone the repository
git clone https://github.com/heain/korean-rag-server
cd korean-rag-server
# Set up environment variables
echo "OPENAI_API_KEY=your_api_key_here" > .env
# Prepare your RAG data
# Create a 'data' folder and place YOUR target PDF files inside.
mkdir dataOn the first run, the server will automatically load your PDFs, split the text, and embed the documents, caching the indexes (faiss_index, bm25_index.pkl) locally.
docker-compose up -d --build
Once the containers are up, the API will be accessible via http://localhost:8080/api/ask.
Open test_api.py and edit TO BE FILLED area with your question. Then, run the code.
python test_api.py