https://documentchat4u.streamlit.app/ A production-ready Retrieval-Augmented Generation (RAG) pipeline for querying PDF documents using natural language. Built with FastAPI, Inngest, Streamlit, and Qdrant β fully deployable on Render + Streamlit Cloud with no OpenAI dependency required.
Streamlit Cloud (UI)
β
Render FastAPI (Backend)
β
Inngest Cloud (Workflow Orchestration)
β
Qdrant Cloud (Vector Database)
PDF Upload Flow
Streamlit β POST /upload β Save PDF on Render
β Trigger rag/ingest_pdf via Inngest
β Load β Chunk β Embed β Store in Qdrant
Query Flow
Streamlit β POST /query β Trigger rag/query_pdf_ai via Inngest
β Embed question β Search Qdrant
β Return extractive answer β Display in UI
- Upload and ingest PDF documents via a modern Streamlit dashboard
- Semantic search using sentence-transformers (no OpenAI required)
- Extractive answer mode β returns retrieved context directly
- Inngest-powered durable workflow orchestration with retries and throttling
- Qdrant Cloud vector store with configurable dimensions
- Secure backend proxy for Inngest polling (no secrets exposed in the frontend)
- Deterministic chunk UUIDs to avoid duplicate vectors on re-ingestion
- OpenAI support available as an optional drop-in for embeddings and answer generation
| Layer | Technology |
|---|---|
| Frontend | Streamlit |
| Backend API | FastAPI + Uvicorn |
| Workflow Engine | Inngest |
| Vector Database | Qdrant Cloud |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 (or OpenAI) |
| PDF Parsing | LlamaIndex (PDFReader, SentenceSplitter) |
| Deployment | Render (backend), Streamlit Cloud (frontend) |
- Python 3.11+
- uv package manager
- Docker (for local Qdrant)
- Node.js (for local Inngest Dev Server)
git clone https://github.com/your-username/RAG-based-agent.git
cd RAG-based-agentuv sync --frozenCopy the example env file and fill in your values:
cp .env.example .env# Inngest
INNGEST_EVENT_KEY=your_event_key
INNGEST_SIGNING_KEY=your_signing_key
# Qdrant
QDRANT_URL=https://your-cluster.qdrant.io
QDRANT_API_KEY=your_qdrant_api_key
QDRANT_COLLECTION=docs_minilm
QDRANT_DIM=384
# Embeddings
EMBED_PROVIDER=sentence-transformers
SENTENCE_TRANSFORMER_MODEL=sentence-transformers/all-MiniLM-L6-v2
EMBED_DIM=384
# Answer generation
ANSWER_PROVIDER=extractive
# Optional: OpenAI
# OPENAI_API_KEY=sk-...docker run -d --name qdrantRAGdb \
-p 6333:6333 \
-v "$(pwd)/qdrant_storage:/qdrant/storage" \
qdrant/qdrantnpx --yes --ignore-scripts=false inngest-cli@latest dev \
-u http://127.0.0.1:8000/api/inngest --no-discoveryuv run uvicorn main:app --host 0.0.0.0 --port 8000 --reloaduv run streamlit run streamlit_app.py| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check |
POST |
/upload |
Upload a PDF and trigger ingestion |
POST |
/query |
Ask a question over ingested documents |
GET |
/runs/{event_id} |
Poll Inngest run status (server-side proxy) |
POST |
/api/inngest |
Inngest webhook handler |
curl -X POST https://your-backend.onrender.com/query \
-H "Content-Type: application/json" \
-d '{"question": "What is the main topic?", "top_k": 5}'| Setting | Value |
|---|---|
| Build Command | pip install uv && uv sync --frozen |
| Start Command | uv run uvicorn main:app --host 0.0.0.0 --port $PORT |
Required environment variables on Render:
INNGEST_EVENT_KEY
INNGEST_SIGNING_KEY
QDRANT_URL
QDRANT_API_KEY
QDRANT_COLLECTION=docs_minilm
QDRANT_DIM=384
EMBED_PROVIDER=sentence-transformers
SENTENCE_TRANSFORMER_MODEL=sentence-transformers/all-MiniLM-L6-v2
EMBED_DIM=384
ANSWER_PROVIDER=extractive
Add the following to your Streamlit secrets:
FASTAPI_URL = "https://your-backend.onrender.com"
INNGEST_UI_URL = "https://app.inngest.com"- Create an account at inngest.com
- Go to Production environment β Event Keys β create a new event key
- Go to Signing Key section β copy your signing key
- Add both to your Render environment variables
- Register your deployed endpoint in Inngest Cloud:
https://your-backend.onrender.com/api/inngest
| Mode | EMBED_PROVIDER |
Requires |
|---|---|---|
| Open-source (default) | sentence-transformers |
Nothing β runs locally |
| OpenAI | openai |
OPENAI_API_KEY with credits |
| Fake (testing only) | set USE_FAKE_EMBEDDINGS=1 |
Nothing |
β οΈ Always use the same embedding mode for ingestion and querying. Mixing modes produces invalid results.
RAG-based-agent/
βββ main.py # FastAPI app + Inngest functions
βββ data_loader.py # PDF loading, chunking, embedding
βββ vector_db.py # Qdrant client wrapper
βββ custom_types.py # Shared type definitions
βββ streamlit_app.py # Streamlit frontend
βββ .env.example # Environment variable template
βββ .gitignore
βββ pyproject.toml # uv/pip dependencies
- Never commit
.envto Git β it is listed in.gitignore - The
/runs/{event_id}endpoint proxies Inngest polling server-side so yourINNGEST_SIGNING_KEYis never exposed to the browser or Streamlit Cloud - Rotate any API keys that were accidentally exposed
MIT License. See LICENSE for details.
- LlamaIndex for PDF parsing and chunking
- Qdrant for vector storage
- Inngest for durable workflow orchestration
- Sentence Transformers for open-source embeddings