-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
This is umbrella issue for work to deprecate separate https://github.com/Riptano/embedding-gateway repo, by merging its functionality into Data API, run within same process.
At high level, EGW exposes 5 RPC methods, 3 for embeddings, 2 for reranking:
EmbeddingService (EmbeddingServiceImpl)
embed— Vectorize text. Takes a provider name, model, list of input texts, and auth tokens; returns float embeddings.getSupportedProviders— Returns the full catalog of configured embedding providers, their models, auth requirements, parameters, and request properties.validateCredential— Validates a credential/API key against SyncService for a given provider and tenant.
RerankingService (RerankingServiceImpl)
rerank— Re-scores a list of text passages against a query using a reranking model. Returns ranked indices with scores.getSupportedRerankingProviders— Returns the catalog of configured reranking providers, their models, auth configs, and request properties.
Of these, Data API can already use in "embedded mode" (1) and (4) -- calling providers directly -- as well as (2) and (5) (Load local configs) -- as long as setting --stargate.jsonapi.operations.enableEmbeddingGateway -- is disabled (default setting); already done when running ITs. This leaves (3), validateCredential as one thing to port over.
So
- Config loading (EmbeddingProvidersConfigProducer): when disabled, uses the default embedded config rather than fetching from the gateway.
- Credential validation (VectorizeConfigValidator:192): credential validation against the gateway only runs when enabled.
- Reranking (RerankingProviderFactory): same flag controls whether reranking uses the gateway or direct HTTP.
In addition, retry logic and extra logic may need to be added on Data API side.
So: tentative work:
- Add
io.stargate.embedding.gateway.secretspackage and espSyncServiceClienton Data API codebase, change code inVectorizeConfigValidator(around line 192) to use that directly, instead of gRPC - Check if retry logic needs to be improved
- See
io.stargate.embedding.gateway.ratelimitfor error-based rate-limiting in particular
- See
- Check if extra logging needs adding (see
io.stargate.embedding.gateway.logging.EmbeddingGatewayLog
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels