Retrieval Augmented Chatbot

A local AI assistant that retrieves and chats over website content using a Retrieval-Augmented Generation (RAG) pipeline.

Built with Ollama, LangChain, and ChromaDB for private, on-device knowledge-based conversations.

How It Works

The project is broken down into three main stages:

Crawl (scraper.py): An automated script starts at a specified URL and crawls through the website, extracting all the clean, textual content from each page.
Ingest (ingest.py): The scraped text is broken down into smaller chunks, converted into numerical vector embeddings, and stored in a local vector database (ChromaDB). This database acts as the AI's "brain."
Chat (app.py): A Streamlit web application provides a user-friendly chat interface. When a user asks a question, the app retrieves the most relevant information from the database and uses an AI language model to generate a context-aware answer.

How to Use

Step 1: Prerequisites

Ollama: Make sure Ollama is installed and running. You can get it from https://ollama.com/.
AI Models: Pull the necessary models by running the following commands in your terminal:
```
ollama pull nomic-embed-text
ollama pull phi3
```
Python: Ensure you have Python 3.9 or newer.

Step 2: Setup

Clone the Repository: Get the project files onto your local machine.

Create a Virtual Environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies:
```
pip install -r requirements.txt
```

Step 3: Configure Your Assistant

Open the config.py file. This is the only file you need to edit.

Set the START_URL to the homepage of the website you want the bot to learn from.
Customize the APP_TITLE, SUBJECT_NAME, ASSISTANT_NAME, etc., to define your bot's identity.

Step 4: Build the Knowledge Base

Run the data processing scripts in order. This only needs to be done once for each new website.

Run the Scraper: This will crawl the website defined in your config and create scraped_content.json.
```
python scraper.py
```
Run the Ingestion Script: This will process the JSON file and create the chroma_db vector store.
```
python ingest.py
```

Step 5: Launch the Chatbot

Now you can start the web application.

streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.py		config.py
ingest.py		ingest.py
requirements.txt		requirements.txt
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval Augmented Chatbot

How It Works

How to Use

Step 1: Prerequisites

Step 2: Setup

Step 3: Configure Your Assistant

Step 4: Build the Knowledge Base

Step 5: Launch the Chatbot

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Retrieval Augmented Chatbot

How It Works

How to Use

Step 1: Prerequisites

Step 2: Setup

Step 3: Configure Your Assistant

Step 4: Build the Knowledge Base

Step 5: Launch the Chatbot

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages