Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
9b5e007
Merge pull request #100 from edubotics-ai/dev_branch
trgardos Aug 28, 2024
e26dba6
readme iteration 1
XThomasBU Aug 28, 2024
edc6f13
Merge pull request #101 from edubotics-ai/readme_iteration
trgardos Aug 29, 2024
8ea5331
langchain release patch
Farid-Karimli Aug 29, 2024
952d3d0
Tagging releases and push
Farid-Karimli Aug 29, 2024
cb555d8
black format changes
XThomasBU Aug 29, 2024
02a1d52
changed to publish on release
XThomasBU Aug 29, 2024
799faec
Merge pull request #102 from edubotics-ai/Farid-patch
Farid-Karimli Aug 29, 2024
d0572b6
OpenAI API Key added to constants
XThomasBU Aug 29, 2024
b3028b7
fix
XThomasBU Aug 29, 2024
918240d
initial commit for langgraph implementation
XThomasBU Aug 30, 2024
2822a5c
workflow to sync apps dir with a PR
XThomasBU Aug 30, 2024
90d4d47
minor update
XThomasBU Aug 30, 2024
55e26e5
Merge pull request #105 from edubotics-ai/sync_apps_dir
Farid-Karimli Aug 30, 2024
3e47786
Merge pull request #103 from edubotics-ai/Xavier-patch
Farid-Karimli Aug 30, 2024
bb8c87b
Fix for sync workflow (#106)
XThomasBU Aug 30, 2024
70c849e
Update sync_apps_dir_with_edubotics_app.yml
XThomasBU Aug 30, 2024
4755eb6
removing sync workflow -- did not work
XThomasBU Aug 30, 2024
70c94e3
updates
XThomasBU Aug 30, 2024
879d1e1
updates
XThomasBU Aug 31, 2024
505f7e7
updates
XThomasBU Aug 31, 2024
b48faa7
more updates
XThomasBU Sep 2, 2024
3a92f7d
Initial attempt at gh repo loading
Farid-Karimli Sep 9, 2024
d348fe5
changes
xavierohan Sep 10, 2024
e87a84b
Local notebook loading and vectorstore manager changes
Farid-Karimli Sep 11, 2024
fa8b983
Merge branch 'notebook-loading' into nb_loading_xavier
xavierohan Sep 12, 2024
ba9f2f4
fix
XThomasBU Sep 12, 2024
f6b46bd
updates
XThomasBU Sep 12, 2024
1070917
update
XThomasBU Sep 12, 2024
2b361ce
updates
XThomasBU Sep 12, 2024
d9a7f52
minor changes
Farid-Karimli Sep 13, 2024
f773bed
cleanup
xavierohan Sep 14, 2024
eb867a3
Dataloader changes
Farid-Karimli Sep 16, 2024
667b079
Merge branch 'notebook-loading' into nb_loading_xavier
Farid-Karimli Sep 16, 2024
a5156f5
Merge pull request #113 from edubotics-ai/nb_loading_xavier
Farid-Karimli Sep 16, 2024
bfbccb1
minor change
Farid-Karimli Sep 16, 2024
c433a05
Update README.md
trgardos Sep 18, 2024
f545a8c
read raw cells in notebook
Farid-Karimli Sep 19, 2024
7928533
Collect different base_url links, but dont descend
Farid-Karimli Sep 20, 2024
c259a90
minor change
Farid-Karimli Sep 20, 2024
9fc7a8e
Markdown splitter
Farid-Karimli Sep 22, 2024
c37a9ca
format change
Farid-Karimli Sep 22, 2024
11c01bd
minor
Farid-Karimli Sep 22, 2024
636db06
Initial implementation of better metadata extraction
Farid-Karimli Sep 22, 2024
c44e48e
.env fix
Farid-Karimli Sep 22, 2024
8d8e8be
import change due to chainlit update
Farid-Karimli Sep 22, 2024
4e2b433
Rollback some changes
Farid-Karimli Sep 22, 2024
4f6892a
Bug fix
Farid-Karimli Sep 23, 2024
fe65eb4
pydantic import change
Farid-Karimli Sep 23, 2024
c62938b
minor changes
Farid-Karimli Sep 23, 2024
fb1dcbf
Minor changes and freeze dep. versions
Farid-Karimli Sep 25, 2024
c18804e
Changes for linter
Farid-Karimli Sep 25, 2024
5a1e169
whitespace change
Farid-Karimli Sep 25, 2024
870837a
add timeouts
Farid-Karimli Sep 25, 2024
1a7905f
remove try-except-pass
Farid-Karimli Sep 25, 2024
4aabcb0
linter change
Farid-Karimli Sep 25, 2024
e3fccb5
Merge pull request #116 from edubotics-ai/notebook-loading
Farid-Karimli Sep 25, 2024
d59ed7a
minor change to reqa
Farid-Karimli Sep 25, 2024
c14958e
Merge pull request #117 from edubotics-ai/reqs-patch
trgardos Sep 25, 2024
96db374
Merge branch 'main' into langraph_implementation
XThomasBU Sep 27, 2024
7d4a323
Update helpers.py
XThomasBU Sep 27, 2024
96336b7
format changes
XThomasBU Sep 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 2 additions & 5 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
name: Publish to PyPI

on:
push:
branches:
- main
tags:
- "v*"
release:
types: [published]

jobs:
deploy:
Expand Down
41 changes: 29 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,33 @@
# edubotics-core

## Welcome to edubotics-core by Edubotics AI! 👋

![PyPI](https://img.shields.io/pypi/v/edubotics-core.svg)
![GitHub stars](https://img.shields.io/github/stars/edubotics-ai/edubot-core.svg)
![License](https://img.shields.io/github/license/edubotics-ai/edubot-core.svg)
![PyPI Downloads](https://img.shields.io/pypi/dm/edubotics-core.svg)
[![GitHub Contributors](https://img.shields.io/github/contributors/edubotics-ai/edubot-core)](https://github.com/edubotics-ai/edubot-core/graphs/contributors)
<p align="center">
<a href="http://docs.edubotics.ai/">
<img src="https://github.com/edubotics-ai/.github/blob/main/assets/images/edubot-mascot.png?raw=true" alt="edubotics-ai" width="10%" height="10%">
</a>
</p>
<p align="center">
<em>Edubotics AI - Empower Education with AI: Create Intelligent Chatbots Quickly and Efficiently</em>
</p>
<p align="center">
<a href="https://github.com/edubotics-ai/edubot-core">
<img src="https://img.shields.io/pypi/v/edubotics-core.svg" alt="PyPI">
</a>
<a href="https://github.com/edubotics-ai/edubot-core">
<img src="https://img.shields.io/github/stars/edubotics-ai/edubot-core.svg" alt="GitHub stars">
</a>
<a href="https://github.com/edubotics-ai/edubot-core">
<img src="https://img.shields.io/github/license/edubotics-ai/edubot-core.svg" alt="License">
</a>
<a href="https://pypi.org/project/edubotics-core">
<img src="https://img.shields.io/pypi/dm/edubotics-core.svg" alt="PyPI Downloads">
</a>
<a href="https://github.com/edubotics-ai/edubot-core/graphs/contributors">
<img src="https://img.shields.io/github/contributors/edubotics-ai/edubot-core.svg" alt="GitHub Contributors">
</a>
</p>

**Empower Education with AI: Create Intelligent Chatbots Quickly and Efficiently 🚀**
## Welcome to edubotics-core by Edubotics AI! 👋

edubotics-core is an open-source Python library that allows developers to build LLM-based chatbots efficiently. It provides a comprehensive set of core modules for vector storage, retrieval, processing, with more to come.
**edubotics-core** is an open-source Python library that allows developers to build LLM-based chatbots efficiently. It provides a comprehensive set of core modules for vector storage, retrieval, processing, with more to come.

## 🛠 Installation

Expand All @@ -20,7 +37,7 @@ You can install edubotics-core using pip:
pip install edubotics-core
```

Full documentation can be found [here](https://edubotics-ai.github.io/edubot-core/).
Full documentation can be found [here](http://docs.edubotics.ai/).

## ✨ Key Features
- Modular and Extensible: Easily create, modify, and extend to the core modules.
Expand All @@ -38,4 +55,4 @@ We welcome contributions to edubotics-core! If you're interested in contributing

## 📜 License

edubotics-core is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
edubotics-core is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
11 changes: 11 additions & 0 deletions apps/ai_tutor/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
LLAMA_CLOUD_API_KEY=
OPENAI_API_KEY=
HF_TOKEN=
HUGGING_FACE_TOKEN=
LITERAL_API_KEY_LOGGING=
LITERAL_API_URL=https://cloud.getliteral.ai
CHAINLIT_AUTH_SECRET=
CHAINLIT_URL=http://localhost:8000
OAUTH_GOOGLE_CLIENT_ID=
OAUTH_GOOGLE_CLIENT_SECRET=
EMAIL_ENCRYPTION_KEY=
2 changes: 1 addition & 1 deletion apps/ai_tutor/chainlit_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
import chainlit as cl
from edubotics_core.chat.llm_tutor import LLMTutor
from edubotics_core.chat.helpers import (
get_sources,
get_history_chat_resume,
get_history_setup_llm,
# get_last_config,
Expand All @@ -22,6 +21,7 @@
from helpers import (
check_user_cooldown,
reset_tokens_for_user,
get_sources,
)
from helpers import get_time
import copy
Expand Down
4 changes: 2 additions & 2 deletions apps/ai_tutor/config/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ vectorstore:
data_path: 'storage/data' # str
url_file_path: 'storage/data/urls.txt' # str
expand_urls: True # bool
db_option : 'RAGatouille' # str [FAISS, Chroma, RAGatouille, RAPTOR]
db_option : 'FAISS' # str [FAISS, Chroma, RAGatouille, RAPTOR]
db_path : 'vectorstores' # str
model : 'sentence-transformers/all-MiniLM-L6-v2' # str [sentence-transformers/all-MiniLM-L6-v2, text-embedding-ada-002']
search_top_k : 3 # int
Expand All @@ -25,7 +25,7 @@ vectorstore:
index_name: "new_idx" # str

llm_params:
llm_arch: 'langchain' # [langchain]
llm_arch: 'langgraph' # [langchain, langgraph]
use_history: True # bool
generate_follow_up: False # bool
memory_window: 3 # int
Expand Down
11 changes: 10 additions & 1 deletion apps/ai_tutor/config/config_manager.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from pydantic import BaseModel, conint, confloat, HttpUrl
from typing import Optional, List
from typing import Optional, List, Dict, Any
import yaml
from .prompts import prompts


class FaissParams(BaseModel):
Expand Down Expand Up @@ -112,6 +113,10 @@ class APIConfig(BaseModel):
timeout: conint(gt=0) = 60


class PromptsConfig(BaseModel):
prompts: Dict[str, Any] = prompts


class Config(BaseModel):
log_dir: str = "storage/logs"
log_chunk_dir: str = "storage/logs/chunks"
Expand All @@ -126,6 +131,7 @@ class Config(BaseModel):
token_config: TokenConfig
misc: MiscConfig
api_config: APIConfig
prompts_dict: PromptsConfig = PromptsConfig(prompts=prompts)


class ConfigManager:
Expand All @@ -142,6 +148,9 @@ def load_config(self) -> Config:
with open(self.project_config_path, "r") as f:
project_config_data = yaml.safe_load(f)

# Add prompts to the project config
project_config_data["prompts_dict"] = prompts

# Merge the two configurations
merged_config = {**config_data, **project_config_data}

Expand Down
2 changes: 1 addition & 1 deletion apps/ai_tutor/config/prompts.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
prompts = {
"openai": {
"gpt-4o-mini": {
"rephrase_prompt": (
"You are someone that rephrases statements. Rephrase the student's question to add context from their chat history if relevant, ensuring it remains from the student's point of view. "
"Incorporate relevant details from the chat history to make the question clearer and more specific. "
Expand Down
87 changes: 87 additions & 0 deletions apps/ai_tutor/helpers.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,93 @@
from datetime import datetime, timedelta, timezone
import tiktoken
from edubotics_core.chat_processor.helpers import update_user_info, convert_to_dict
import chainlit as cl


def get_sources(res, answer, stream=True, view_sources=False):
source_elements = []
source_dict = {} # Dictionary to store URL elements

for idx, source in enumerate(res["context"]):
print(source)
source_metadata = source.metadata
url = source_metadata.get("source", "N/A")
score = source_metadata.get("score", "N/A")
page = source_metadata.get("page", 1)

lecture_tldr = source_metadata.get("tldr", "N/A")
lecture_recording = source_metadata.get("lecture_recording", "N/A")
suggested_readings = source_metadata.get("suggested_readings", "N/A")
date = source_metadata.get("date", "N/A")

source_type = source_metadata.get("source_type", "N/A")

url_name = f"{url}_{page}"
if url_name not in source_dict:
source_dict[url_name] = {
"text": source.page_content,
"url": url,
"score": score,
"page": page,
"lecture_tldr": lecture_tldr,
"lecture_recording": lecture_recording,
"suggested_readings": suggested_readings,
"date": date,
"source_type": source_type,
}
else:
source_dict[url_name]["text"] += f"\n\n{source.page_content}"

full_answer = "" # Not to include the answer again if streaming

if not stream: # First, display the answer if not streaming
# full_answer = "**Answer:**\n"
full_answer += answer

if view_sources:
# Then, display the sources
# check if the answer has sources
if len(source_dict) == 0:
full_answer += "\n\n**No sources found.**"
return full_answer, source_elements, source_dict
else:
full_answer += "\n\n**Sources:**\n"
for idx, (url_name, source_data) in enumerate(source_dict.items()):
full_answer += f"\nSource {idx + 1} (Score: {source_data['score']}): {source_data['url']}\n"

name = f"Source {idx + 1} Text\n"
full_answer += name
source_elements.append(
cl.Text(name=name, content=source_data["text"], display="side")
)

# Add a PDF element if the source is a PDF file
if source_data["url"].lower().endswith(".pdf"):
name = f"Source {idx + 1} PDF\n"
full_answer += name
pdf_url = f"{source_data['url']}#page={source_data['page']+1}"
source_elements.append(
cl.Pdf(name=name, url=pdf_url, display="side")
)

full_answer += "\n**Metadata:**\n"
for idx, (url_name, source_data) in enumerate(source_dict.items()):
full_answer += f"\nSource {idx + 1} Metadata:\n"
source_elements.append(
cl.Text(
name=f"Source {idx + 1} Metadata",
content=f"Source: {source_data['url']}\n"
f"Page: {source_data['page']}\n"
f"Type: {source_data['source_type']}\n"
f"Date: {source_data['date']}\n"
f"TL;DR: {source_data['lecture_tldr']}\n"
f"Lecture Recording: {source_data['lecture_recording']}\n"
f"Suggested Readings: {source_data['suggested_readings']}\n",
display="side",
)
)

return full_answer, source_elements, source_dict


def get_time():
Expand Down
2 changes: 1 addition & 1 deletion apps/chainlit_base/chainlit_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@
import chainlit as cl
from edubotics_core.chat.llm_tutor import LLMTutor
from edubotics_core.chat.helpers import (
get_sources,
get_history_setup_llm,
)
from helpers import get_sources
import copy
from langchain_community.callbacks import get_openai_callback
from config.config_manager import config_manager
Expand Down
28 changes: 26 additions & 2 deletions apps/chainlit_base/config/config_manager.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from pydantic import BaseModel, conint, confloat, HttpUrl
from typing import Optional, List
from typing import Optional, List, Dict, Any
import yaml
from .prompts import prompts


class FaissParams(BaseModel):
Expand All @@ -24,7 +25,8 @@ class VectorStoreConfig(BaseModel):
db_option: str = "RAGatouille" # Options: [FAISS, Chroma, RAGatouille, RAPTOR]
db_path: str = "vectorstores"
model: str = (
"sentence-transformers/all-MiniLM-L6-v2" # Options: [sentence-transformers/all-MiniLM-L6-v2, text-embedding-ada-002]
# Options: [sentence-transformers/all-MiniLM-L6-v2, text-embedding-ada-002]
"sentence-transformers/all-MiniLM-L6-v2"
)
search_top_k: conint(gt=0) = 3
score_threshold: confloat(ge=0.0, le=1.0) = 0.2
Expand Down Expand Up @@ -95,10 +97,26 @@ class MetadataConfig(BaseModel):
slide_base_link: HttpUrl = "https://dl4ds.github.io"


class TokenConfig(BaseModel):
cooldown_time: conint(gt=0) = 60
regen_time: conint(gt=0) = 180
tokens_left: conint(gt=0) = 2000
all_time_tokens_allocated: conint(gt=0) = 1000000


class MiscConfig(BaseModel):
github_repo: HttpUrl = "https://github.com/edubotics-ai/edubot-core"
docs_website: HttpUrl = "https://dl4ds.github.io/dl4ds_tutor/"


class APIConfig(BaseModel):
timeout: conint(gt=0) = 60


class PromptsConfig(BaseModel):
prompts: Dict[str, Any] = prompts


class Config(BaseModel):
log_dir: str = "storage/logs"
log_chunk_dir: str = "storage/logs/chunks"
Expand All @@ -110,7 +128,10 @@ class Config(BaseModel):
splitter_options: SplitterOptions
retriever: RetrieverConfig
metadata: MetadataConfig
token_config: TokenConfig
misc: MiscConfig
api_config: APIConfig
prompts_dict: PromptsConfig = PromptsConfig(prompts=prompts)


class ConfigManager:
Expand All @@ -127,6 +148,9 @@ def load_config(self) -> Config:
with open(self.project_config_path, "r") as f:
project_config_data = yaml.safe_load(f)

# Add prompts to the project config
project_config_data["prompts_dict"] = prompts

# Merge the two configurations
merged_config = {**config_data, **project_config_data}

Expand Down
26 changes: 26 additions & 0 deletions apps/chainlit_base/config/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from dotenv import load_dotenv
import os

load_dotenv()

# API Keys - Loaded from the .env file

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
LLAMA_CLOUD_API_KEY = os.getenv("LLAMA_CLOUD_API_KEY")
HUGGINGFACE_TOKEN = os.getenv("HUGGINGFACE_TOKEN")
LITERAL_API_KEY_LOGGING = os.getenv("LITERAL_API_KEY_LOGGING")
LITERAL_API_URL = os.getenv("LITERAL_API_URL")
CHAINLIT_URL = os.getenv("CHAINLIT_URL")
EMAIL_ENCRYPTION_KEY = os.getenv("EMAIL_ENCRYPTION_KEY")

OAUTH_GOOGLE_CLIENT_ID = os.getenv("OAUTH_GOOGLE_CLIENT_ID")
OAUTH_GOOGLE_CLIENT_SECRET = os.getenv("OAUTH_GOOGLE_CLIENT_SECRET")

opening_message = "Hey, What Can I Help You With?\n\nYou can me ask me questions about the course logistics, course content, about the final project, or anything else!"
chat_end_message = (
"I hope I was able to help you. If you have any more questions, feel free to ask!"
)

# Model Paths

LLAMA_PATH = "../storage/models/tinyllama"
2 changes: 1 addition & 1 deletion apps/chainlit_base/config/prompts.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
prompts = {
"openai": {
"gpt-4o-mini": {
"rephrase_prompt": (
"You are someone that rephrases statements. Rephrase the student's question to add context from their chat history if relevant, ensuring it remains from the student's point of view. "
"Incorporate relevant details from the chat history to make the question clearer and more specific. "
Expand Down
Loading