byok openai embeddings added#248
Conversation
|
@staru09 can u rename the file properly so one understands the purpose of file ? |
There was a problem hiding this comment.
Pull request overview
Adds a Python example demonstrating “bring your own embeddings” (BYO embeddings) with Moss by generating embeddings via OpenAI, ingesting them into a Moss index, and querying with embedding-provided searches (fixes #244).
Changes:
- Added an interactive BYO embeddings example script that embeds with OpenAI, creates/loads a Moss index, and runs a small REPL for search + add.
- Added a Locomo conversation JSON fixture used as an ingestion corpus for the example.
- Updated the Python examples
.env.templateto includeOPENAI_API_KEY.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| examples/python/byok_oai.py | New interactive BYO embeddings + OpenAI embedding/chat demo using Moss custom embeddings. |
| examples/python/locomo_sample0.json | New sample conversation corpus for indexing/query demo. |
| examples/python/.env.template | Adds OpenAI API key placeholder for running the example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from openai import OpenAI | ||
| from openai.types.chat import ChatCompletionMessageParam | ||
|
|
||
| load_dotenv(Path(__file__).resolve().parent / ".env", override=True) |
| MOSS_PROJECT_ID = os.environ["MOSS_PROJECT_ID"] | ||
| MOSS_PROJECT_KEY = os.environ["MOSS_PROJECT_KEY"] | ||
| OPENAI_API_KEY = os.environ["OPENAI_API_KEY"] |
|
|
||
| self.docs = {d.id: d.text for d in docs} | ||
|
|
||
| await self.moss.load_index(self.index_name) |
| result = await self.moss.add_docs( | ||
| self.index_name, | ||
| [DocumentInfo(id=new_id, text=text, embedding=vec)], | ||
| MutationOptions(upsert=True), | ||
| ) |
| MOSS_PROJECT_ID=your_project_id_here | ||
| MOSS_PROJECT_KEY=your_project_key_here No newline at end of file | ||
| MOSS_PROJECT_KEY=your_project_key_here | ||
| ##LLM API key |
| async def add(self, text: str) -> None: | ||
| new_id = f"manual-{uuid.uuid4().hex[:8]}" | ||
| vec = self._embed([text])[0] | ||
| result = await self.moss.add_docs( | ||
| self.index_name, | ||
| [DocumentInfo(id=new_id, text=text, embedding=vec)], | ||
| MutationOptions(upsert=True), | ||
| ) | ||
| self.docs[new_id] = text | ||
| print( | ||
| f" added [{new_id}] (job={result.job_id}, total docs={result.doc_count})" | ||
| ) |
There was a problem hiding this comment.
🚩 Locally loaded index not refreshed after :add command
After setup() calls self.moss.load_index(self.index_name) at examples/python/byok_oai.py:92 or examples/python/byok_oai.py:126, all subsequent queries route through the local in-memory index (see sdks/python/sdk/src/moss/client/moss_client.py:198-201). The add method at examples/python/byok_oai.py:168-179 calls add_docs which sends the document to the cloud, but never reloads the local index. This means documents added via :add during the REPL session won't appear in subsequent query results until the process is restarted. The in-memory self.docs cache is updated (so :list and :count reflect the addition), which could mislead users into thinking the doc is searchable. This is an inherent architectural characteristic (mutations go to cloud, local index is a snapshot), but worth noting for an interactive demo that invites users to add and immediately search.
Was this helpful? React with 👍 or 👎 to provide feedback.
Pull Request Checklist
Please ensure that your PR meets the following requirements:
Description
Added an example for bring your own embeddings using openai embedding model
Fixes #244
Type of Change