Skip to content

RAG application with basic metric evaluation and Gradio web UI

License

Notifications You must be signed in to change notification settings

brainboost/enhanced-rag-ui

Repository files navigation

Enhanced RAG Application with TruLens Evaluation

An AI application that allows users to ask questions about documents using the power of LLM through OpenAI API, with comprehensive RAG triad evaluation using TruLens integration.

✨ Features

🎯 Core RAG Functionality

  • Document Ingestion: Automatic parsing, chunking, metadata extraction, embedding generation and storage
  • Hybrid Search: Combines sparse/dense vector search for optimal retrieval
  • Contextual Retrieval: Returns most relevant chunks based on user queries
  • Chat & Completions: Abstracts retrieval complexity with prompt engineering

πŸ“Š Evaluation & Analytics

  • RAG Triad Evaluation: Real-time scoring of Context Relevance, Groundedness, and Answer Relevance
  • TruLens Integration: Comprehensive evaluation framework with detailed metrics
  • Performance Tracking: Latency, token usage, and cost monitoring
  • Visual Analytics: Interactive charts and dashboards for metrics visualization

πŸŽ›οΈ UI Controls

  • Dynamic Configuration: Real-time adjustment of chunk size (100-2000 tokens) and top-k (1-20)
  • Answer Style Toggle: Choose between Concise, Balanced, or Explanatory responses
  • Temperature Control: Adjust LLM creativity and determinism
  • Evaluation Toggle: Enable/disable real-time evaluation for performance

πŸ’¬ Chat Features

  • Chat History: Persistent conversation storage with search capabilities
  • Real-time Evaluation: Display RAG triad scores for each response
  • Document Upload: Multi-format support (PDF, DOCX, TXT, MD)
  • Status Monitoring: System health and performance indicators

πŸ—οΈ Architecture

Core Components

  • LlamaIndex: RAG pipeline framework with vector stores and embeddings
  • Qdrant: High-performance vector database for semantic search
  • OpenAI API: LLM integration for response generation
  • TruLens: Evaluation framework for RAG triad metrics
  • Gradio: Modern web interface with real-time updates

Evaluation Metrics

  1. Context Relevance: How relevant retrieved chunks are to the query
  2. Groundedness: How well the answer is supported by retrieved context
  3. Answer Relevance: How relevant the answer is to the original question
  4. RAG Triad Score: Combined score (average of the three metrics)

Hybrid Search Strategy

  • Vector Search: Semantic similarity using embeddings
  • Lexical Search: BM25 algorithm for keyword matching
  • Score Fusion: Intelligent combination of both approaches
  • Dynamic Weighting: Adaptive scoring based on query characteristics

πŸš€ Quick Start

Prerequisites

  • Python 3.13+
  • OpenAI API key
  • Git

Installation

  1. Clone the repository:
git clone https://github.com/brainboost/enhanced-rag-ui.git
cd sergei-vedishchev
  1. Create virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
# Using uv (recommended)
pip install uv
uv sync

# Or using pip
pip install -r requirements.txt
  1. Configure environment:
# Copy environment template
cp .env.example .env

# Edit with your configuration
nano .env

Required environment variables:

OPENAI_API_KEY=your_openai_api_key_here
  1. Create sample documents (optional):
# with uv (preferred)
uv run app.py --create-samples

python app.py --create-samples
  1. Launch the application:
# with uv (preferred)
uv run app.py  

# old school
python app.py

The application will be available at http://localhost:7860

πŸ“– Usage Guide

Basic Chat Interaction

  1. Upload Documents: Use the file upload interface to ingest your documents
  2. Configure Settings: Adjust chunk size, top-k, and answer style as needed
  3. Ask Questions: Type your questions in the chat interface
  4. View Evaluation: Check real-time RAG triad scores for each response

Advanced Configuration

Chunk Size Optimization

  • Small chunks (100-300): Better for precise information retrieval
  • Medium chunks (300-800): Good balance of context and precision
  • Large chunks (800-2000): Better for complex queries requiring more context

Top-K Selection

  • Low (1-3): High precision, lower recall
  • Medium (4-8): Balanced precision and recall
  • High (9-20): High recall, lower precision

Answer Styles

  • Concise: Brief, direct answers with essential information only
  • Balanced: Well-structured answers with sufficient detail
  • Explanatory: Detailed answers with explanations and context

Evaluation Interpretation

RAG Triad Scores

  • 0.8-1.0: Excellent performance
  • 0.6-0.8: Good performance
  • 0.4-0.6: Moderate performance
  • 0.2-0.4: Poor performance
  • 0.0-0.2: Very poor performance

Individual Metrics

  • Context Relevance: Quality of retrieved documents
  • Groundedness: Factual accuracy based on provided context
  • Answer Relevance: How well the response addresses the query

πŸ”§ Configuration

Settings Files

  • settings.yaml: Main configuration file
  • .env: Environment variables and API keys

Key Configuration Options

# Server settings
server:
  port: 7860
  host: 0.0.0.0

# LLM configuration
llm:
  mode: openai
  model: gpt-5
  temperature: 0.1
  max_tokens: 1024

# Embedding settings
embedding:
  mode: openai
  model: text-embedding-3-small
  embed_dim: 1536

# Vector store
vectorstore:
  database: qdrant

# Ingestion settings
ingestion:
  chunk_size: 512
  chunk_overlap: 50

πŸ“Š Monitoring & Analytics

Metrics Dashboard

  • Real-time Scores: Live RAG triad evaluation
  • Performance Charts: Latency and cost trends over time
  • Usage Statistics: Token consumption and API costs
  • Export Capabilities: Download metrics in JSON or CSV format

Performance Optimization

  • Latency Monitoring: Track response times across configurations
  • Cost Tracking: Monitor OpenAI API usage and expenses
  • Quality Trends: Analyze evaluation scores over time
  • A/B Testing: Compare different configuration settings

πŸ§ͺ Testing

Running Tests

# Install test dependencies
uv pip install -e .[dev]

# Run all tests
pytest

# Run with coverage
pytest --cov=.

# Run specific test file
pytest tests/test_evaluation.py

Test Coverage

  • Unit tests for evaluation functions
  • Integration tests for RAG pipeline
  • UI component testing
  • Configuration validation tests

πŸš€ Deployment

Local Deployment

# Production mode
uv run app.py # port 7860
uv run app.py --port 8080 --share

# With custom configuration
uv run app.py --config production.yaml

Docker Deployment

FROM python:3.13-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 7860

CMD ["python", "app.py", "--port", "7860"]

Environment Variables

  • Development: Use .env file
  • Production: Use environment variables or secret management
  • Docker: Pass via -e flags or docker-compose

πŸ” Troubleshooting

Common Issues

Installation Problems

  • Python Version: Ensure Python 3.13+ is installed
  • Dependencies: Try uv sync --frozen before running
  • Environment: Verify all required environment variables are set

Performance Issues

  • High Latency: Reduce chunk size or top-k value
  • Poor Scores: Increase temperature or adjust answer style
  • Memory Usage: Use smaller chunk sizes for large documents

Evaluation Issues

  • Zero Scores: Check OpenAI API key and connectivity
  • Inconsistent Results: Verify document ingestion completed successfully
  • Missing Metrics: Ensure evaluation is enabled in configuration

Debug Mode

# Enable debug logging
export LOG_LEVEL=DEBUG

# Run with verbose output
python app.py --config debug.yaml

Project Structure

sergei-vedishchev/
β”œβ”€β”€ app.py                 # Main application entry point
β”œβ”€β”€ requirements.txt         # Python dependencies
β”œβ”€β”€ settings.yaml          # Main configuration
β”œβ”€β”€ evaluation/            # Evaluation modules
β”‚   β”œβ”€β”€ trulens_integration.py
β”‚   └── feedback_functions.py
β”œβ”€β”€ ui/                   # User interface components
β”‚   β”œβ”€β”€ chat_interface.py
β”‚   └── components.py
β”œβ”€β”€ utils/                 # Utility modules
β”‚   β”œβ”€β”€ config_manager.py
β”‚   └── metrics_tracker.py
└── data/documents/        # Document storage

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

About

RAG application with basic metric evaluation and Gradio web UI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages