Skip to content

Add router-visible PD Engine/Decoder pair identity #626

Description

@YouNeedCryDear

What to build

Add router-visible identity metadata for RDMA-local PD Engine/Decoder pairs. Once topology-aware placement defines the selected topology domain, OME should label or annotate Engine and Decoder pods so the router can discover matched prefill/decode pairs and avoid selecting an Engine from one rack or RDMA fabric with a Decoder from another.

The router-facing contract should keep placement and routing concerns separate: scheduling decides where pods land, while the router consumes stable pair and topology metadata to pick a local pair for prefill/decode work. This should support current service-discovery selector flows and leave room for future topology-aware routing or KV-transfer policies.

Acceptance criteria

  • Engine and Decoder pods expose a stable pair identity that is scoped to the InferenceService and selected topology domain.
  • Pods expose the selected topology domain key/value in router-consumable labels or annotations.
  • Router discovery config can select matched Engine/Decoder pairs instead of only independent Engine and Decoder pools.
  • The router can avoid cross-rack or cross-fabric prefill/decode handoff when pair metadata is present.
  • Behavior is backwards compatible when pair metadata is absent or the RDMA locality policy is disabled.
  • The pair identity is stable across ordinary reconciles and changes only when placement/topology changes require it.
  • Tests cover label/annotation generation, router selector/config generation, missing metadata fallback, and multiple pair/topology-domain scenarios.
  • Docs explain how router pairing relates to RDMA placement, what metadata is produced, and how operators can inspect the selected pairs.

Blocked by

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions