LOCAL RAG-AGENT
Local RAG app to chat and ask questions about files built with a simple PyQt5 UI, CLI tools, and a Chroma vector store. It allows fo uploading CSV/XLSX/PDF/DOCX/TXT, chunks and embeds with Ollama, then answers questions using retrieved context via LangChain.
Model used
In this project the local Ollama has be utilized,
Model: llama3.2:1bEmbedd Model: ollama pull mxbai-embed-large
Usage
To start the UI run main.py
- Drag and drop, or select files.
- Start asking questions about the files.
Note: by default the DB directory is removed on exit to keep runs clean.
Key Libraries
langchain, for chaining and chunking utilities, connects to local Ollama for embeddings and LLMs, persistent vector store and retrieverPyQt5: desktop UI (drag & drop ingest, chat-like Q&A view).
Prerequisites
- Python 3.10+ (3.12 recommended).
- Ollama installed and running (https://ollama.com). Pull the models you configure in
config.yaml:ollama pull mxbai-embed-largeollama pull llama3.2:1b(or your chosen model)
Setup
Create a virtual environment, activate and install requirements:
python -m venv .venv
.venv\Scripts\Activate
pip install -r requirements.txtReview the config.yaml, most importantly
EMBEDD_MODEL: embedding model name for Ollama.MODEL: chat/LLM model name for Ollama.db_location: Chroma directory (created if missing).
How It Works (Workflow)
When uploading files its direcctly stored in the Chroma database, and available for the LLM during the session, then exiting the app, the database is cleared and deleted.
-
Ingest
vector_store.read_data(...)extracts text from supported files.vector_store.chunk_str(...)splits text with inot chunks.vector_store.add_to_db(file_path)read the file, chunks it, embedds it, and writes to Chroma database.
-
Retrieval + Generation
config.load_config()builds aChromaretriever and anOllamaLLM, both wired fromconfig.yaml.rag_engine.generate_response(query)retrieves top-k docs, formats a brief prompt, and invokes the LLM. The prompt asks to include sources (file names) available in document metadata.
Files & Responsibilities
config.yaml: model names, chunking, and DB settings.config.py: loads YAML, returns(CONFIG, RETRIVER, MODEL)wired to Ollama + Chroma.vector_store.py: read, chunk, embed, and persist to Chroma.rag_engine.py: prompt + generate response using retriever and LLM.main_view.py: PyQt5 UI for ingestion and Q&A (spawns child processes for isolation).ingest_cli.pyandrag_cli.py: simple CLI wrappers for headless ingest and Q&A.
Design Notes
- Keeps concerns separate: ingestion (I/O + chunking) vs retrieval/generation.
- Small, safe batches when adding to Chroma to avoid long blocking calls.
- Models and chunking are configurable via YAML without code changes.
- Uses subprocesses for long-running tasks from the UI to keep it responsive.
