Deployed Chatbot: RAG Q&A Chatbot
A Retrieval-Augmented Generation (RAG) based chatbot powered by Google Gemini, designed for intelligent Q&A over uploaded documents. The app supports various file formats and provides context-aware responses using the Gemini large language model.
📌 Note: This project was developed as part of the Week 8 Assignment in the Celebal Summer Internship program.
Dataset Resource: Kaggle - Loan Approval Prediction
▶️ Watch Demo Video - to know how to use chatbot.
-
Modular Design: Clean, maintainable code organized into distinct modules:
config.py– Environment and configuration settingsutils.py– File reading, cleaning, and chunkingretriever.py– Embedding creation and FAISS-based similarity searchgemini_qa.py– Gemini API integration and response generationstreamlit_app.py– Streamlit frontend for interactive user experience
-
Multi-File Support: Upload and parse PDF, TXT, DOCX, and CSV files
-
Fast Retrieval: Uses Sentence Transformers and
faiss-cpufor efficient chunk retrieval -
LLM Integration: Uses Google Gemini via
google-generativeaito generate high-quality answers -
Session State: Streamlit session state is used to retain data and avoid redundant processing
-
Optional Context Viewer: Toggle to view retrieved context chunks used by the LLM
-
Basic Error Handling: Handles missing files, API errors, and corrupted inputs gracefully
RAG-QnA-Chatbot/
│
├── .env # Stores your Gemini API key
├── .gitignore # Prevents sensitive or unnecessary files from being committed
├── config.py # Configuration file for environment variables
├── utils.py # File reading, text preprocessing, and chunking
├── retriever.py # Embedding generation and FAISS-based context retrieval
├── gemini_qa.py # Interacts with Google Gemini API for answer generation
├── streamlit_app.py # Main Streamlit frontend app
├── requirements.txt # All required Python dependencies
├── README.md # Project overview and setup guide
│
├── data/ # Sample input files for testing the chatbot
│ ├── Sample_Submission.csv
│ ├── Test Dataset.csv
│ └── Training Dataset.csv
│
├── demo/ # Screenshots and demo video for the chatbot
│ ├── brief_explanation.png
│ ├── imp_factor.png
│ └── chatbot_demo.mp4
│
├── .streamlit/ # Streamlit configuration (optional)
│ └── config.toml
│
└── pycache/ # Auto-generated Python cache files (ignored in Git)
git clone https://github.com/ShubhamS168/RAG-QnA-chatbot.gitGEMINI_API_KEY="YOUR_GEMINI_API_KEY"Replace
"YOUR_GEMINI_API_KEY"with your actual Google Gemini API key.
.env
__pycache__/
.DS_Store
*.pycpip install -r requirements.txtStart the Streamlit app:
streamlit run streamlit_app.pyIn your terminal click on Local URL: http://localhost:8501 to launch app
(venv) G:\Download\RAG-QnA-chatbot>streamlit run streamlit_app.py
You can now view your Streamlit app in your browser.
Local URL: http://localhost:8501
Network URL: http://10.19.17.113:8501
External URL: http://223.228.149.133:8501
Once the app launches in your browser, you can:
- Upload documents (PDF, TXT, DOCX, CSV).
- Ask any question based on their content.
- View the LLM-generated answer and optionally the supporting context.
_Add screenshots of the Streamlit interface and output here.
Figure 1: 🧾 The chatbot explains the structure of the uploaded loan dataset, listing all features such as Loan ID, Gender, ApplicantIncome, and more in response to a user query.
Figure 2: ✅ The chatbot identifies key factors influencing loan approval—highlighting Credit_History and Loan_Status—based on analysis of the dataset.
streamlit
python-dotenv
pandas
docx2txt
PyMuPDF
google-generativeai
sentence-transformers
faiss-cpu
| Component | Description |
|---|---|
| Frontend | Streamlit dashboard with file uploader and chat box |
| Embedding | Sentence Transformers + FAISS index |
| LLM | Google Gemini via google-generativeai SDK |
| Persistence | Streamlit Session State for caching results |
| Security | API key stored securely in .env |
- Author: Shubham Sourav - Data Science Intern at Celebal Technologies
- Dataset: Use any datasets
- Resources Used:
RAG Q&A Chatbot with Generative AI
The objective of this project is to build and deploy a Retrieval-Augmented Generation (RAG) chatbot using Streamlit that intelligently answers user questions based on uploaded documents. Leveraging both embedding-based retrieval and powerful generative models, the system aims to:
- 📁 Let users upload documents (e.g., PDFs, CSVs, DOCX, TXT)
- 🧠 Use retrieval techniques (FAISS + embeddings) to fetch relevant context from the uploaded files
- ✨ Generate context-aware answers using LLMs (e.g., Gemini, OpenAI, Claude, Grok, or lightweight Hugging Face models)
- 💬 Offer an interactive Streamlit interface for uploading files and querying them
- ✅ Work even with limited/free access APIs to ensure cost-efficiency and accessibility
This chatbot is tested with real-world data like the Loan Approval Prediction dataset on Kaggle, making it a robust example of combining NLP, document understanding, and LLM deployment for real applications.
For any queries, feedback, or collaboration, feel free to connect:
📧 Email: shubhamsourav475@gmail.com
📝 Note:
This repository is maintained as part of the CSI (Celebal Summer Internship) program and is intended for educational use.
Distributed under the MIT License.
© 2025 Shubham Sourav. All rights reserved.