Skip to content

[BUG] BestFit accelerator policy without constraints causes controller panic #627

Description

@janekmichalik

What happened?

Creating an InferenceService with acceleratorSelector.policy: BestFit and no constraints causes the OME controller to panic during reconciliation.

The controller successfully finds the runtime, fetches accelerator candidates, and filters them:

Fetched candidate accelerators {"count": 3}
Filtered candidates {"total": 3, "eligible": 3}
Candidates after filtering {"count": 3, "policy": "BestFit"}

It then panics with:

runtime error: invalid memory address or nil pointer dereference

The stack trace points to:

github.com/sgl-project/ome/pkg/acceleratorclassselector.calculateMemoryFitScore
/workspace/pkg/acceleratorclassselector/policy_helpers.go:200

github.com/sgl-project/ome/pkg/acceleratorclassselector.calculateBestFitScore
/workspace/pkg/acceleratorclassselector/policy_helpers.go:187

github.com/sgl-project/ome/pkg/acceleratorclassselector.(*defaultSelector).selectBestFit
/workspace/pkg/acceleratorclassselector/selector.go:281

What did you expect to happen?

OME should not panic.

Expected behavior should be one of:

  1. BestFit selects the best fitting AcceleratorClass from the runtime candidate list (or model metadata?), or
  2. OME returns a clear reconciliation error/status condition saying that BestFit requires explicit constraints such as minMemory.

How can we reproduce it (as minimally and precisely as possible)?

Create a ClusterServingRuntime with valid accelerator candidates:

apiVersion: ome.io/v1beta1
kind: ClusterServingRuntime
metadata:
  name: lab-sglang-gb200-candidates
spec:
  disabled: false
  acceleratorRequirements:
    acceleratorClasses:
      - gb200-1gpu
      - gb200-2gpu
      - gb200-4gpu
  supportedModelFormats:
    - modelFramework:
        name: transformers
        version: "4.51.0"
      modelFormat:
        name: safetensors
        version: "1.0.0"
      modelArchitecture: Qwen3ForCausalLM
      autoSelect: false
      priority: 1
  protocolVersions:
    - openAI
  modelSizeRange:
    min: 0.5B
    max: 1B
  engineConfig:
    runner:
      name: ome-container
      image: docker.io/lmsysorg/sglang:dev-cu13
      command:
        - sh
        - -c
        - sleep infinity
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1

Create AcceleratorClass objects:

apiVersion: ome.io/v1beta1
kind: AcceleratorClass
metadata:
  name: gb200-1gpu
spec:
  vendor: nvidia
  family: blackwell
  model: gb200
  capabilities:
    memoryGB: 192Gi
    features:
      - fp8
      - nvlink
  discovery:
    nodeSelector:
      nvidia.com/gpu.product: NVIDIA-GB200
  resources:
    - name: nvidia.com/gpu
      quantity: "1"
---
apiVersion: ome.io/v1beta1
kind: AcceleratorClass
metadata:
  name: gb200-2gpu
spec:
  vendor: nvidia
  family: blackwell
  model: gb200
  capabilities:
    memoryGB: 384Gi
    features:
      - fp8
      - nvlink
  discovery:
    nodeSelector:
      nvidia.com/gpu.product: NVIDIA-GB200
  resources:
    - name: nvidia.com/gpu
      quantity: "2"
---
apiVersion: ome.io/v1beta1
kind: AcceleratorClass
metadata:
  name: gb200-4gpu
spec:
  vendor: nvidia
  family: blackwell
  model: gb200
  capabilities:
    memoryGB: 768Gi
    features:
      - fp8
      - nvlink
  discovery:
    nodeSelector:
      nvidia.com/gpu.product: NVIDIA-GB200
  resources:
    - name: nvidia.com/gpu
      quantity: "4"

Create an InferenceService using BestFit without constraints:

apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
  name: lab-policy-bestfit
  namespace: ome-lab
spec:
  acceleratorSelector:
    policy: BestFit
  model:
    name: qwen3-0-6b
  runtime:
    name: lab-sglang-gb200-candidates
  engine:
    minReplicas: 1
    maxReplicas: 1

Anything else we need to know?

Using BestFit with explicit constraints works:

acceleratorSelector:
  policy: BestFit
  constraints:
    minMemory: 384

With the above constraint, OME filters candidates correctly and selects gb200-2gpu.

Important detail: AcceleratorClass.spec.capabilities.memoryGB must be specified as a Kubernetes quantity, for example:

memoryGB: 384Gi

Using:

memoryGB: "384"

is interpreted as bytes and causes memory filtering to fail.

Environment

  • OME version: 0.1.5
  • Kubernetes version (use kubectl version): v1.32.8
  • Cloud provider or hardware configuration: 2 GPU nodes, 4 × NVIDIA GB200 GPUs each
  • OS:Ubuntu 24.04.4 LTS
  • Runtime: SGLang image docker.io/lmsysorg/sglang:dev-cu13
  • Model being served: Qwen/Qwen3-0.6B
  • Install method: Helm OCI from the docs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions