Skip to content

brittojo7n/Retrieval-Augmented-Chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retrieval Augmented Chatbot

A local AI assistant that retrieves and chats over website content using a Retrieval-Augmented Generation (RAG) pipeline.

Built with Ollama, LangChain, and ChromaDB for private, on-device knowledge-based conversations.


How It Works

The project is broken down into three main stages:

  1. Crawl (scraper.py): An automated script starts at a specified URL and crawls through the website, extracting all the clean, textual content from each page.

  2. Ingest (ingest.py): The scraped text is broken down into smaller chunks, converted into numerical vector embeddings, and stored in a local vector database (ChromaDB). This database acts as the AI's "brain."

  3. Chat (app.py): A Streamlit web application provides a user-friendly chat interface. When a user asks a question, the app retrieves the most relevant information from the database and uses an AI language model to generate a context-aware answer.

How to Use

Step 1: Prerequisites

  • Ollama: Make sure Ollama is installed and running. You can get it from https://ollama.com/.

  • AI Models: Pull the necessary models by running the following commands in your terminal:

    ollama pull nomic-embed-text
    ollama pull phi3
  • Python: Ensure you have Python 3.9 or newer.

Step 2: Setup

  1. Clone the Repository: Get the project files onto your local machine.

  2. Create a Virtual Environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install Dependencies:

    pip install -r requirements.txt

Step 3: Configure Your Assistant

Open the config.py file. This is the only file you need to edit.

  • Set the START_URL to the homepage of the website you want the bot to learn from.

  • Customize the APP_TITLE, SUBJECT_NAME, ASSISTANT_NAME, etc., to define your bot's identity.

Step 4: Build the Knowledge Base

Run the data processing scripts in order. This only needs to be done once for each new website.

  1. Run the Scraper: This will crawl the website defined in your config and create scraped_content.json.

    python scraper.py
  2. Run the Ingestion Script: This will process the JSON file and create the chroma_db vector store.

    python ingest.py

Step 5: Launch the Chatbot

Now you can start the web application.

streamlit run app.py

Releases

No releases published

Packages

 
 
 

Contributors

Languages