What happened?
Creating an InferenceService with acceleratorSelector.policy: BestFit and no constraints causes the OME controller to panic during reconciliation.
The controller successfully finds the runtime, fetches accelerator candidates, and filters them:
Fetched candidate accelerators {"count": 3}
Filtered candidates {"total": 3, "eligible": 3}
Candidates after filtering {"count": 3, "policy": "BestFit"}
It then panics with:
runtime error: invalid memory address or nil pointer dereference
The stack trace points to:
github.com/sgl-project/ome/pkg/acceleratorclassselector.calculateMemoryFitScore
/workspace/pkg/acceleratorclassselector/policy_helpers.go:200
github.com/sgl-project/ome/pkg/acceleratorclassselector.calculateBestFitScore
/workspace/pkg/acceleratorclassselector/policy_helpers.go:187
github.com/sgl-project/ome/pkg/acceleratorclassselector.(*defaultSelector).selectBestFit
/workspace/pkg/acceleratorclassselector/selector.go:281
What did you expect to happen?
OME should not panic.
Expected behavior should be one of:
BestFit selects the best fitting AcceleratorClass from the runtime candidate list (or model metadata?), or
- OME returns a clear reconciliation error/status condition saying that
BestFit requires explicit constraints such as minMemory.
How can we reproduce it (as minimally and precisely as possible)?
Create a ClusterServingRuntime with valid accelerator candidates:
apiVersion: ome.io/v1beta1
kind: ClusterServingRuntime
metadata:
name: lab-sglang-gb200-candidates
spec:
disabled: false
acceleratorRequirements:
acceleratorClasses:
- gb200-1gpu
- gb200-2gpu
- gb200-4gpu
supportedModelFormats:
- modelFramework:
name: transformers
version: "4.51.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelArchitecture: Qwen3ForCausalLM
autoSelect: false
priority: 1
protocolVersions:
- openAI
modelSizeRange:
min: 0.5B
max: 1B
engineConfig:
runner:
name: ome-container
image: docker.io/lmsysorg/sglang:dev-cu13
command:
- sh
- -c
- sleep infinity
resources:
requests:
cpu: 1
memory: 1Gi
nvidia.com/gpu: 1
limits:
cpu: 1
memory: 1Gi
nvidia.com/gpu: 1
Create AcceleratorClass objects:
apiVersion: ome.io/v1beta1
kind: AcceleratorClass
metadata:
name: gb200-1gpu
spec:
vendor: nvidia
family: blackwell
model: gb200
capabilities:
memoryGB: 192Gi
features:
- fp8
- nvlink
discovery:
nodeSelector:
nvidia.com/gpu.product: NVIDIA-GB200
resources:
- name: nvidia.com/gpu
quantity: "1"
---
apiVersion: ome.io/v1beta1
kind: AcceleratorClass
metadata:
name: gb200-2gpu
spec:
vendor: nvidia
family: blackwell
model: gb200
capabilities:
memoryGB: 384Gi
features:
- fp8
- nvlink
discovery:
nodeSelector:
nvidia.com/gpu.product: NVIDIA-GB200
resources:
- name: nvidia.com/gpu
quantity: "2"
---
apiVersion: ome.io/v1beta1
kind: AcceleratorClass
metadata:
name: gb200-4gpu
spec:
vendor: nvidia
family: blackwell
model: gb200
capabilities:
memoryGB: 768Gi
features:
- fp8
- nvlink
discovery:
nodeSelector:
nvidia.com/gpu.product: NVIDIA-GB200
resources:
- name: nvidia.com/gpu
quantity: "4"
Create an InferenceService using BestFit without constraints:
apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
name: lab-policy-bestfit
namespace: ome-lab
spec:
acceleratorSelector:
policy: BestFit
model:
name: qwen3-0-6b
runtime:
name: lab-sglang-gb200-candidates
engine:
minReplicas: 1
maxReplicas: 1
Anything else we need to know?
Using BestFit with explicit constraints works:
acceleratorSelector:
policy: BestFit
constraints:
minMemory: 384
With the above constraint, OME filters candidates correctly and selects gb200-2gpu.
Important detail: AcceleratorClass.spec.capabilities.memoryGB must be specified as a Kubernetes quantity, for example:
Using:
is interpreted as bytes and causes memory filtering to fail.
Environment
- OME version:
0.1.5
- Kubernetes version (use
kubectl version): v1.32.8
- Cloud provider or hardware configuration: 2 GPU nodes, 4 × NVIDIA GB200 GPUs each
- OS:
Ubuntu 24.04.4 LTS
- Runtime: SGLang image
docker.io/lmsysorg/sglang:dev-cu13
- Model being served:
Qwen/Qwen3-0.6B
- Install method: Helm OCI from the docs
What happened?
Creating an
InferenceServicewithacceleratorSelector.policy: BestFitand noconstraintscauses the OME controller to panic during reconciliation.The controller successfully finds the runtime, fetches accelerator candidates, and filters them:
It then panics with:
The stack trace points to:
What did you expect to happen?
OME should not panic.
Expected behavior should be one of:
BestFitselects the best fittingAcceleratorClassfrom the runtime candidate list (or model metadata?), orBestFitrequires explicit constraints such asminMemory.How can we reproduce it (as minimally and precisely as possible)?
Create a
ClusterServingRuntimewith valid accelerator candidates:Create
AcceleratorClassobjects:Create an
InferenceServiceusingBestFitwithout constraints:Anything else we need to know?
Using
BestFitwith explicit constraints works:With the above constraint, OME filters candidates correctly and selects
gb200-2gpu.Important detail:
AcceleratorClass.spec.capabilities.memoryGBmust be specified as a Kubernetes quantity, for example:Using:
is interpreted as bytes and causes memory filtering to fail.
Environment
0.1.5kubectl version):v1.32.8Ubuntu 24.04.4 LTSdocker.io/lmsysorg/sglang:dev-cu13Qwen/Qwen3-0.6B