A simple yet powerful Retrieval-Augmented Generation (RAG) pipeline built with Streamlit that allows you to upload PDF documents and chat with them using OpenAI's GPT models.
- 📄 Multi-PDF Support: Upload and process multiple PDF documents at once
- 🔍 Smart Chunking: Automatically splits documents into optimal chunks for processing
- 💬 Interactive Chat: Natural conversation interface with your documents
- 📖 Source Citations: See exactly which parts of your documents the AI is referencing
- 🚀 Easy Setup: Simple installation process with helpful setup script
- 💾 Session Memory: Maintains conversation history during your session
- Python 3.8 or higher
- OpenAI API key (Get one here)
-
Clone the repository
git clone <your-repo-url> cd rag-pipeline
-
Run the setup script
python setup.py
This will:
- Check your Python version
- Install all dependencies
- Create a .env file from the template
-
Add your OpenAI API key
- Open the
.env
file - Replace
your-openai-api-key-here
with your actual API key
- Open the
-
Run the application
streamlit run app.py
-
Start the App: Run
streamlit run app.py
and open the provided URL in your browser -
Upload Documents:
- Click on "Browse files" in the sidebar
- Select one or more PDF files
- Click "🚀 Process Documents"
-
Start Chatting:
- Once documents are processed, type your question in the chat input
- The AI will answer based on the content of your uploaded documents
- Click "View Sources" to see which parts of the documents were used
-
Clear History: Use the "Clear Chat History" button to start a fresh conversation
You can modify these settings in app.py
:
- Model: Change
gpt-3.5-turbo
togpt-4
for more advanced responses - Temperature: Adjust the
temperature
parameter (0-1) for response creativity - Chunk Size: Modify
chunk_size
in the text splitter for different document processing - Retrieved Documents: Change
k
value in search_kwargs to retrieve more/fewer sources
rag-pipeline/
├── app.py # Main Streamlit application
├── setup.py # Setup helper script
├── requirements.txt # Python dependencies
├── env_example.txt # Environment variables template
├── .gitignore # Git ignore file
└── README.md # This file
- Make sure you've created a
.env
file fromenv_example.txt
- Ensure your API key is correctly added to the
.env
file
- Run
pip install -r requirements.txt
to install all dependencies
- Check that your PDFs are not password-protected
- Ensure you have enough OpenAI API credits
- Try with smaller PDF files first
- Better Results: Upload documents that are related to each other for more coherent conversations
- Specific Questions: Ask specific questions rather than broad ones
- Multiple Sources: Upload multiple documents to create a comprehensive knowledge base
- Cost Management: Monitor your OpenAI API usage to manage costs
- Never commit your
.env
file to version control - Keep your OpenAI API key secure
- The app processes documents locally before sending to OpenAI
Potential improvements you could make:
- Support for more file formats (TXT, DOCX, etc.)
- Persistent vector storage
- User authentication
- Export chat history
- Fine-tuning options
- Local LLM support
This project is open source and available under the MIT License.