You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys (#3813)
# Add support for Google Gemini `gemini-embedding-001` embedding model
and correctly registers model type
MR message created with the assistance of Claude-4.5-sonnet
This resolves#3755
## What does this PR do?
This PR adds support for the `gemini-embedding-001` Google embedding
model to the llama-stack Gemini provider. This model provides
high-dimensional embeddings (3072 dimensions) compared to the existing
`text-embedding-004` model (768 dimensions). Old embeddings models (such
as text-embedding-004) will be deprecated soon according to Google
([Link](https://developers.googleblog.com/en/gemini-embedding-available-gemini-api/))
## Problem
The Gemini provider only supported the `text-embedding-004` embedding
model. The newer `gemini-embedding-001` model, which provides
higher-dimensional embeddings for improved semantic representation, was
not available through llama-stack.
## Solution
This PR consists of three commits that implement, fix the model
registration, and enable embedding generation:
### Commit 1: Initial addition of gemini-embedding-001
Added metadata for `gemini-embedding-001` to the
`embedding_model_metadata` dictionary:
```python
embedding_model_metadata: dict[str, dict[str, int]] = {
"text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},
"gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048}, # NEW
}
```
**Issue discovered:** The model was not being registered correctly
because the dictionary keys didn't match the model IDs returned by
Gemini's API.
### Commit 2: Fix model ID matching with `models/` prefix
Updated both dictionary keys to include the `models/` prefix to match
Gemini's OpenAI-compatible API response format:
```python
embedding_model_metadata: dict[str, dict[str, int]] = {
"models/text-embedding-004": {"embedding_dimension": 768, "context_length": 2048}, # UPDATED
"models/gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048}, # UPDATED
}
```
**Root cause:** Gemini's OpenAI-compatible API returns model IDs with
the `models/` prefix (e.g., `models/text-embedding-004`). The
`OpenAIMixin.list_models()` method directly matches these IDs against
the `embedding_model_metadata` dictionary keys. Without the prefix, the
models were being registered as LLMs instead of embedding models.
### Commit 3: Fix embedding generation for providers without usage stats
Fixed a bug in `OpenAIMixin.openai_embeddings()` that prevented
embedding generation for providers (like Gemini) that don't return usage
statistics:
```python
# Before (Line 351-354):
usage = OpenAIEmbeddingUsage(
prompt_tokens=response.usage.prompt_tokens, # ← Crashed with AttributeError
total_tokens=response.usage.total_tokens,
)
# After (Lines 351-362):
if response.usage:
usage = OpenAIEmbeddingUsage(
prompt_tokens=response.usage.prompt_tokens,
total_tokens=response.usage.total_tokens,
)
else:
usage = OpenAIEmbeddingUsage(
prompt_tokens=0, # Default when not provided
total_tokens=0, # Default when not provided
)
```
**Impact:** This fix enables embedding generation for **all** Gemini
embedding models, not just the newly added one.
## Changes
### Modified Files
**`llama_stack/providers/remote/inference/gemini/gemini.py`**
- Line 17: Updated `text-embedding-004` key to
`models/text-embedding-004`
- Line 18: Added `models/gemini-embedding-001` with correct metadata
**`llama_stack/providers/utils/inference/openai_mixin.py`**
- Lines 351-362: Added null check for `response.usage` to handle
providers without usage statistics
## Key Technical Details
### Model ID Matching Flow
1. `list_provider_model_ids()` calls Gemini's `/v1/models` endpoint
2. API returns model IDs like: `models/text-embedding-004`,
`models/gemini-embedding-001`
3. `OpenAIMixin.list_models()` (line 410) checks: `if metadata :=
self.embedding_model_metadata.get(provider_model_id)`
4. If matched, registers as `model_type: "embedding"` with metadata;
otherwise registers as `model_type: "llm"`
### Why Both Keys Needed the Prefix
The `text-embedding-004` model was already working because there was
likely separate configuration or manual registration handling it. For
auto-discovery to work correctly for **both** models, both keys must
match the API's model ID format exactly.
## How to test this PR
Verified the changes by:
1. **Model Auto-Discovery**: Started llama-stack server and confirmed
models are auto-discovered from Gemini API
2. **Model Registration**: Confirmed both embedding models are correctly
registered and visible
```bash
curl http://localhost:8325/v1/models | jq '.data[] | select(.provider_id == "gemini" and .model_type == "embedding")'
```
**Results:**
- ✅ `gemini/models/text-embedding-004` - 768 dimensions - `model_type:
"embedding"`
- ✅ `gemini/models/gemini-embedding-001` - 3072 dimensions -
`model_type: "embedding"`
3. **Before Fix (Commit 1)**: Models appeared as `model_type: "llm"`
without embedding metadata
4. **After Fix (Commit 2)**: Models correctly identified as `model_type:
"embedding"` with proper metadata
5. **Generate Embeddings**: Verified embedding generation works
```bash
curl -X POST http://localhost:8325/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model": "gemini/models/gemini-embedding-001", "input": "test"}' | \
jq '.data[0].embedding | length'
```
0 commit comments