Skip to content

Merge EGW functionality into Data API repo #2387

@tatu-at-datastax

Description

@tatu-at-datastax

This is umbrella issue for work to deprecate separate https://github.com/Riptano/embedding-gateway repo, by merging its functionality into Data API, run within same process.

At high level, EGW exposes 5 RPC methods, 3 for embeddings, 2 for reranking:

EmbeddingService (EmbeddingServiceImpl)

  1. embed — Vectorize text. Takes a provider name, model, list of input texts, and auth tokens; returns float embeddings.
  2. getSupportedProviders — Returns the full catalog of configured embedding providers, their models, auth requirements, parameters, and request properties.
  3. validateCredential — Validates a credential/API key against SyncService for a given provider and tenant.

RerankingService (RerankingServiceImpl)

  1. rerank — Re-scores a list of text passages against a query using a reranking model. Returns ranked indices with scores.
  2. getSupportedRerankingProviders — Returns the catalog of configured reranking providers, their models, auth configs, and request properties.

Of these, Data API can already use in "embedded mode" (1) and (4) -- calling providers directly -- as well as (2) and (5) (Load local configs) -- as long as setting --stargate.jsonapi.operations.enableEmbeddingGateway -- is disabled (default setting); already done when running ITs. This leaves (3), validateCredential as one thing to port over.

So

  • Config loading (EmbeddingProvidersConfigProducer): when disabled, uses the default embedded config rather than fetching from the gateway.
  • Credential validation (VectorizeConfigValidator:192): credential validation against the gateway only runs when enabled.
  • Reranking (RerankingProviderFactory): same flag controls whether reranking uses the gateway or direct HTTP.

In addition, retry logic and extra logic may need to be added on Data API side.

So: tentative work:

  1. Add io.stargate.embedding.gateway.secrets package and esp SyncServiceClient on Data API codebase, change code in VectorizeConfigValidator (around line 192) to use that directly, instead of gRPC
  2. Check if retry logic needs to be improved
    • See io.stargate.embedding.gateway.ratelimit for error-based rate-limiting in particular
  3. Check if extra logging needs adding (see io.stargate.embedding.gateway.logging.EmbeddingGatewayLog

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions