Skip to content

ShubhamS168/RAG-QnA-chatbot

Repository files navigation

📄🤖 RAG Q&A Chatbot for Loan Approval Dataset

Deployed Chatbot: RAG Q&A Chatbot

A Retrieval-Augmented Generation (RAG) based chatbot powered by Google Gemini, designed for intelligent Q&A over uploaded documents. The app supports various file formats and provides context-aware responses using the Gemini large language model.

📌 Note: This project was developed as part of the Week 8 Assignment in the Celebal Summer Internship program.


Dataset Resource: Kaggle - Loan Approval Prediction


🎥 Demo – Chatbot Screen Recording

Download or View Chatbot Demo (MP4)

OR

▶️ Watch Demo Video - to know how to use chatbot.

🚀 Features

  • Modular Design: Clean, maintainable code organized into distinct modules:

    • config.py – Environment and configuration settings
    • utils.py – File reading, cleaning, and chunking
    • retriever.py – Embedding creation and FAISS-based similarity search
    • gemini_qa.py – Gemini API integration and response generation
    • streamlit_app.py – Streamlit frontend for interactive user experience
  • Multi-File Support: Upload and parse PDF, TXT, DOCX, and CSV files

  • Fast Retrieval: Uses Sentence Transformers and faiss-cpu for efficient chunk retrieval

  • LLM Integration: Uses Google Gemini via google-generativeai to generate high-quality answers

  • Session State: Streamlit session state is used to retain data and avoid redundant processing

  • Optional Context Viewer: Toggle to view retrieved context chunks used by the LLM

  • Basic Error Handling: Handles missing files, API errors, and corrupted inputs gracefully


📁 Project Structure

RAG-QnA-Chatbot/
│
├── .env                # Stores your Gemini API key
├── .gitignore          # Prevents sensitive or unnecessary files from being committed
├── config.py           # Configuration file for environment variables
├── utils.py            # File reading, text preprocessing, and chunking
├── retriever.py        # Embedding generation and FAISS-based context retrieval
├── gemini_qa.py        # Interacts with Google Gemini API for answer generation
├── streamlit_app.py    # Main Streamlit frontend app
├── requirements.txt    # All required Python dependencies
├── README.md           # Project overview and setup guide
│
├── data/               # Sample input files for testing the chatbot
│ ├── Sample_Submission.csv
│ ├── Test Dataset.csv
│ └── Training Dataset.csv
│
├── demo/               # Screenshots and demo video for the chatbot
│ ├── brief_explanation.png
│ ├── imp_factor.png
│ └── chatbot_demo.mp4
│
├── .streamlit/         # Streamlit configuration (optional)
│ └── config.toml
│
└── pycache/            # Auto-generated Python cache files (ignored in Git)

🛠️ Setup Instructions

1. Clone the Repo & Navigate to Directory

git clone https://github.com/ShubhamS168/RAG-QnA-chatbot.git

2. Create and Configure .env

GEMINI_API_KEY="YOUR_GEMINI_API_KEY"

Replace "YOUR_GEMINI_API_KEY" with your actual Google Gemini API key.

3. Create .gitignore

.env
__pycache__/
.DS_Store
*.pyc

4. Install Requirements

pip install -r requirements.txt

▶️ Run the App

Start the Streamlit app:

streamlit run streamlit_app.py

In your terminal click on Local URL: http://localhost:8501 to launch app

(venv) G:\Download\RAG-QnA-chatbot>streamlit run streamlit_app.py

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://10.19.17.113:8501
  External URL: http://223.228.149.133:8501

Once the app launches in your browser, you can:

  1. Upload documents (PDF, TXT, DOCX, CSV).
  2. Ask any question based on their content.
  3. View the LLM-generated answer and optionally the supporting context.

📸 UI Preview

_Add screenshots of the Streamlit interface and output here.

Homepage Screenshot Figure 1: 🧾 The chatbot explains the structure of the uploaded loan dataset, listing all features such as Loan ID, Gender, ApplicantIncome, and more in response to a user query.


Homepage Screenshot Figure 2: ✅ The chatbot identifies key factors influencing loan approval—highlighting Credit_History and Loan_Status—based on analysis of the dataset.


📦 Requirements

streamlit
python-dotenv
pandas
docx2txt
PyMuPDF
google-generativeai
sentence-transformers
faiss-cpu

✅ Summary

Component Description
Frontend Streamlit dashboard with file uploader and chat box
Embedding Sentence Transformers + FAISS index
LLM Google Gemini via google-generativeai SDK
Persistence Streamlit Session State for caching results
Security API key stored securely in .env

📬 Credits


📚 Project Goal Reminder

RAG Q&A Chatbot with Generative AI

The objective of this project is to build and deploy a Retrieval-Augmented Generation (RAG) chatbot using Streamlit that intelligently answers user questions based on uploaded documents. Leveraging both embedding-based retrieval and powerful generative models, the system aims to:

  • 📁 Let users upload documents (e.g., PDFs, CSVs, DOCX, TXT)
  • 🧠 Use retrieval techniques (FAISS + embeddings) to fetch relevant context from the uploaded files
  • ✨ Generate context-aware answers using LLMs (e.g., Gemini, OpenAI, Claude, Grok, or lightweight Hugging Face models)
  • 💬 Offer an interactive Streamlit interface for uploading files and querying them
  • ✅ Work even with limited/free access APIs to ensure cost-efficiency and accessibility

This chatbot is tested with real-world data like the Loan Approval Prediction dataset on Kaggle, making it a robust example of combining NLP, document understanding, and LLM deployment for real applications.


📬 Contact

For any queries, feedback, or collaboration, feel free to connect:

📧 Email: shubhamsourav475@gmail.com


📝 Note:
This repository is maintained as part of the CSI (Celebal Summer Internship) program and is intended for educational use.

🪪 License

Distributed under the MIT License.
© 2025 Shubham Sourav. All rights reserved.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages