Skip to content

Conversation

Ujjwal-Bajpayee
Copy link

Overview

This PR introduces a minimal Retrieval-Augmented Generation (RAG) example that integrates FAISS-based retrieval with gpt-oss models using Harmony-style prompts.

It is completely self-contained, non-invasive, and designed as an educational reference for ML engineers who want to ground open LLMs in local or private data sources.


🧠 What’s Included

New files only (no core modifications):

  • examples/rag_gpt_oss.py — main example script implementing FAISS indexing, retrieval, and Harmony prompting
  • examples/utils/harmony_helpers.py — helper functions for constructing and validating Harmony-formatted messages
  • examples/requirements-rag.txt — isolated dependencies for RAG example
  • examples/data/ — small local documents for FAISS indexing and retrieval
  • docs/examples/rag_gpt_oss.md — setup and usage guide

⚙️ Key Features

  • FAISS-based semantic search with persistent index (examples/data/.faiss/)
  • SentenceTransformer embeddings (all-MiniLM-L6-v2) for lightweight retrieval
  • Harmony-format chat construction for structured prompts
  • OpenAI-compatible endpoint via environment variables
    • OPENAI_BASE_URL
    • OPENAI_API_KEY
    • GPT_OSS_MODEL
  • Supports streaming and --no-stream inference modes
  • Automatic JSONL logging (examples/data/runs/) with metadata and latency
  • Graceful fallbacks for missing dependencies (clear CLI messages)

🧩 Example Usage

pip install -r examples/requirements-rag.txt

export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="dummy"
export GPT_OSS_MODEL="gpt-oss-mini"

python examples/rag_gpt_oss.py --query "Explain FAISS-based vector search" --top_k 3


Expected Output:
[Assistant]
FAISS (Facebook AI Similarity Search) is a library for efficient vector similarity search...
Sources: [1] intro_vector_search.md, [2] embeddings_and_faiss.md

✅ Validation Checklist

Before submitting the PR, the following items have been verified:

  • Scope Safety: Only new files are added under examples/, examples/utils/, examples/data/, and docs/examples/
  • No Core Modifications: No changes made to pyproject.toml, core libraries, or CI configuration
  • Dependency Isolation: All example dependencies are pinned in examples/requirements-rag.txt
  • Env Vars Handling: Script checks and validates:
    • OPENAI_BASE_URL
    • OPENAI_API_KEY
    • GPT_OSS_MODEL
  • Error Handling: Clear, user-friendly errors for missing dependencies or environment variables (exit code 2)
  • Harmony Prompting: Messages constructed and validated via harmony_helpers.py
  • Retrieval Functionality: FAISS index builds, persists, and reuses correctly
  • Inference Modes: Both streaming and --no-stream work as expected
  • Logging: JSONL logs created in examples/data/runs/ with latency and metadata
  • Docs: Usage instructions and setup steps included in docs/examples/rag_gpt_oss.md
  • Formatting: Code formatted with black and checked with ruff (if available)
  • Local Testing: Script tested locally on GPU with transformers and vLLM backends

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines +135 to +139
def retrieve(query: str, index, chunks: List[Dict], model_name: str, top_k: int) -> List[Dict]:
model = SentenceTransformer(model_name)
qvec = model.encode([query], normalize_embeddings=True)
D, I = index.search(qvec, top_k)
results = []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate top_k before calling FAISS search

The retrieval function forwards the user-provided top_k directly to index.search without clamping it to the number of indexed chunks or ensuring it is positive. When the corpus is small (e.g., only three chunks) and the CLI is invoked with --top_k 100 or --top_k 0, faiss.IndexFlatIP.search raises a Faiss assertion 'k <= index.ntotal' failed (or similar) before any error handling runs, terminating the program instead of emitting the friendly error messages used elsewhere. Validating top_k against index.ntotal and requiring it to be > 0 would avoid the crash.

Useful? React with 👍 / 👎.

@Ujjwal-Bajpayee
Copy link
Author

@simonw @seratch @romainhuet @bojanbabic This PR is designed to help anyone understand how to ground gpt-oss responses in external data using a minimal RAG example.
Please review whenever you get a chance. I’ve verified that it runs locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant