What to build
The runtime auto-selection must handle cases where multiple runtimes overlap on their supported model format and model size range. If overlapping runtimes have autoSelect: true, we must ensure that their priorities are explicitly different. When a user creates an inference service without explicitly specifying a runtime, exactly one runtime—the one with the highest priority—must be auto-selected.
Acceptance criteria
Blocked by
None - can start immediately
What to build
The runtime auto-selection must handle cases where multiple runtimes overlap on their supported model format and model size range. If overlapping runtimes have
autoSelect: true, we must ensure that their priorities are explicitly different. When a user creates an inference service without explicitly specifying a runtime, exactly one runtime—the one with the highest priority—must be auto-selected.Acceptance criteria
ClusterServingRuntimeValidatorandServingRuntimeValidatorwebhooks must ensure that if multiple runtimes (even cross-scope betweenServingRuntimeandClusterServingRuntime) haveautoSelect: trueand overlap onSupportedModelFormatandModelSizeRange, they must have different priorities.pkg/runtimeselector/) correctly honors this priority and strictly auto-selects the runtime with the highest priority when overlaps occur.Blocked by
None - can start immediately