An AI application that allows users to ask questions about documents using the power of LLM through OpenAI API, with comprehensive RAG triad evaluation using TruLens integration.
- Document Ingestion: Automatic parsing, chunking, metadata extraction, embedding generation and storage
- Hybrid Search: Combines sparse/dense vector search for optimal retrieval
- Contextual Retrieval: Returns most relevant chunks based on user queries
- Chat & Completions: Abstracts retrieval complexity with prompt engineering
- RAG Triad Evaluation: Real-time scoring of Context Relevance, Groundedness, and Answer Relevance
- TruLens Integration: Comprehensive evaluation framework with detailed metrics
- Performance Tracking: Latency, token usage, and cost monitoring
- Visual Analytics: Interactive charts and dashboards for metrics visualization
- Dynamic Configuration: Real-time adjustment of chunk size (100-2000 tokens) and top-k (1-20)
- Answer Style Toggle: Choose between Concise, Balanced, or Explanatory responses
- Temperature Control: Adjust LLM creativity and determinism
- Evaluation Toggle: Enable/disable real-time evaluation for performance
- Chat History: Persistent conversation storage with search capabilities
- Real-time Evaluation: Display RAG triad scores for each response
- Document Upload: Multi-format support (PDF, DOCX, TXT, MD)
- Status Monitoring: System health and performance indicators
- LlamaIndex: RAG pipeline framework with vector stores and embeddings
- Qdrant: High-performance vector database for semantic search
- OpenAI API: LLM integration for response generation
- TruLens: Evaluation framework for RAG triad metrics
- Gradio: Modern web interface with real-time updates
- Context Relevance: How relevant retrieved chunks are to the query
- Groundedness: How well the answer is supported by retrieved context
- Answer Relevance: How relevant the answer is to the original question
- RAG Triad Score: Combined score (average of the three metrics)
- Vector Search: Semantic similarity using embeddings
- Lexical Search: BM25 algorithm for keyword matching
- Score Fusion: Intelligent combination of both approaches
- Dynamic Weighting: Adaptive scoring based on query characteristics
- Python 3.13+
- OpenAI API key
- Git
- Clone the repository:
git clone https://github.com/brainboost/enhanced-rag-ui.git
cd sergei-vedishchev- Create virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
# Using uv (recommended)
pip install uv
uv sync
# Or using pip
pip install -r requirements.txt- Configure environment:
# Copy environment template
cp .env.example .env
# Edit with your configuration
nano .envRequired environment variables:
OPENAI_API_KEY=your_openai_api_key_here- Create sample documents (optional):
# with uv (preferred)
uv run app.py --create-samples
python app.py --create-samples- Launch the application:
# with uv (preferred)
uv run app.py
# old school
python app.pyThe application will be available at http://localhost:7860
- Upload Documents: Use the file upload interface to ingest your documents
- Configure Settings: Adjust chunk size, top-k, and answer style as needed
- Ask Questions: Type your questions in the chat interface
- View Evaluation: Check real-time RAG triad scores for each response
- Small chunks (100-300): Better for precise information retrieval
- Medium chunks (300-800): Good balance of context and precision
- Large chunks (800-2000): Better for complex queries requiring more context
- Low (1-3): High precision, lower recall
- Medium (4-8): Balanced precision and recall
- High (9-20): High recall, lower precision
- Concise: Brief, direct answers with essential information only
- Balanced: Well-structured answers with sufficient detail
- Explanatory: Detailed answers with explanations and context
- 0.8-1.0: Excellent performance
- 0.6-0.8: Good performance
- 0.4-0.6: Moderate performance
- 0.2-0.4: Poor performance
- 0.0-0.2: Very poor performance
- Context Relevance: Quality of retrieved documents
- Groundedness: Factual accuracy based on provided context
- Answer Relevance: How well the response addresses the query
settings.yaml: Main configuration file.env: Environment variables and API keys
# Server settings
server:
port: 7860
host: 0.0.0.0
# LLM configuration
llm:
mode: openai
model: gpt-5
temperature: 0.1
max_tokens: 1024
# Embedding settings
embedding:
mode: openai
model: text-embedding-3-small
embed_dim: 1536
# Vector store
vectorstore:
database: qdrant
# Ingestion settings
ingestion:
chunk_size: 512
chunk_overlap: 50- Real-time Scores: Live RAG triad evaluation
- Performance Charts: Latency and cost trends over time
- Usage Statistics: Token consumption and API costs
- Export Capabilities: Download metrics in JSON or CSV format
- Latency Monitoring: Track response times across configurations
- Cost Tracking: Monitor OpenAI API usage and expenses
- Quality Trends: Analyze evaluation scores over time
- A/B Testing: Compare different configuration settings
# Install test dependencies
uv pip install -e .[dev]
# Run all tests
pytest
# Run with coverage
pytest --cov=.
# Run specific test file
pytest tests/test_evaluation.py- Unit tests for evaluation functions
- Integration tests for RAG pipeline
- UI component testing
- Configuration validation tests
# Production mode
uv run app.py # port 7860
uv run app.py --port 8080 --share
# With custom configuration
uv run app.py --config production.yamlFROM python:3.13-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py", "--port", "7860"]- Development: Use
.envfile - Production: Use environment variables or secret management
- Docker: Pass via
-eflags or docker-compose
- Python Version: Ensure Python 3.13+ is installed
- Dependencies: Try
uv sync --frozenbefore running - Environment: Verify all required environment variables are set
- High Latency: Reduce chunk size or top-k value
- Poor Scores: Increase temperature or adjust answer style
- Memory Usage: Use smaller chunk sizes for large documents
- Zero Scores: Check OpenAI API key and connectivity
- Inconsistent Results: Verify document ingestion completed successfully
- Missing Metrics: Ensure evaluation is enabled in configuration
# Enable debug logging
export LOG_LEVEL=DEBUG
# Run with verbose output
python app.py --config debug.yamlsergei-vedishchev/
βββ app.py # Main application entry point
βββ requirements.txt # Python dependencies
βββ settings.yaml # Main configuration
βββ evaluation/ # Evaluation modules
β βββ trulens_integration.py
β βββ feedback_functions.py
βββ ui/ # User interface components
β βββ chat_interface.py
β βββ components.py
βββ utils/ # Utility modules
β βββ config_manager.py
β βββ metrics_tracker.py
βββ data/documents/ # Document storage
This project is licensed under the MIT License - see the LICENSE file for details.
- LlamaIndex - RAG framework
- TruLens - Evaluation framework
- Gradio - UI framework
- Qdrant - Vector database
- OpenAI - LLM API