[BUG] `BestFit` accelerator policy without constraints causes controller panic

## What happened?

Creating an `InferenceService` with `acceleratorSelector.policy: BestFit` and no `constraints` causes the OME controller to panic during reconciliation.

The controller successfully finds the runtime, fetches accelerator candidates, and filters them:

```text
Fetched candidate accelerators {"count": 3}
Filtered candidates {"total": 3, "eligible": 3}
Candidates after filtering {"count": 3, "policy": "BestFit"}
```

It then panics with:

```text
runtime error: invalid memory address or nil pointer dereference
```

The stack trace points to:

```text
github.com/sgl-project/ome/pkg/acceleratorclassselector.calculateMemoryFitScore
/workspace/pkg/acceleratorclassselector/policy_helpers.go:200

github.com/sgl-project/ome/pkg/acceleratorclassselector.calculateBestFitScore
/workspace/pkg/acceleratorclassselector/policy_helpers.go:187

github.com/sgl-project/ome/pkg/acceleratorclassselector.(*defaultSelector).selectBestFit
/workspace/pkg/acceleratorclassselector/selector.go:281
```

## What did you expect to happen?

OME should not panic.

Expected behavior should be one of:

1. `BestFit` selects the best fitting `AcceleratorClass` from the runtime candidate list (or model metadata?), or
2. OME returns a clear reconciliation error/status condition saying that `BestFit` requires explicit constraints such as `minMemory`.

## How can we reproduce it (as minimally and precisely as possible)?

Create a `ClusterServingRuntime` with valid accelerator candidates:

```yaml
apiVersion: ome.io/v1beta1
kind: ClusterServingRuntime
metadata:
  name: lab-sglang-gb200-candidates
spec:
  disabled: false
  acceleratorRequirements:
    acceleratorClasses:
      - gb200-1gpu
      - gb200-2gpu
      - gb200-4gpu
  supportedModelFormats:
    - modelFramework:
        name: transformers
        version: "4.51.0"
      modelFormat:
        name: safetensors
        version: "1.0.0"
      modelArchitecture: Qwen3ForCausalLM
      autoSelect: false
      priority: 1
  protocolVersions:
    - openAI
  modelSizeRange:
    min: 0.5B
    max: 1B
  engineConfig:
    runner:
      name: ome-container
      image: docker.io/lmsysorg/sglang:dev-cu13
      command:
        - sh
        - -c
        - sleep infinity
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
```

Create `AcceleratorClass` objects:

```yaml
apiVersion: ome.io/v1beta1
kind: AcceleratorClass
metadata:
  name: gb200-1gpu
spec:
  vendor: nvidia
  family: blackwell
  model: gb200
  capabilities:
    memoryGB: 192Gi
    features:
      - fp8
      - nvlink
  discovery:
    nodeSelector:
      nvidia.com/gpu.product: NVIDIA-GB200
  resources:
    - name: nvidia.com/gpu
      quantity: "1"
---
apiVersion: ome.io/v1beta1
kind: AcceleratorClass
metadata:
  name: gb200-2gpu
spec:
  vendor: nvidia
  family: blackwell
  model: gb200
  capabilities:
    memoryGB: 384Gi
    features:
      - fp8
      - nvlink
  discovery:
    nodeSelector:
      nvidia.com/gpu.product: NVIDIA-GB200
  resources:
    - name: nvidia.com/gpu
      quantity: "2"
---
apiVersion: ome.io/v1beta1
kind: AcceleratorClass
metadata:
  name: gb200-4gpu
spec:
  vendor: nvidia
  family: blackwell
  model: gb200
  capabilities:
    memoryGB: 768Gi
    features:
      - fp8
      - nvlink
  discovery:
    nodeSelector:
      nvidia.com/gpu.product: NVIDIA-GB200
  resources:
    - name: nvidia.com/gpu
      quantity: "4"
```

Create an `InferenceService` using `BestFit` without constraints:

```yaml
apiVersion: ome.io/v1beta1
kind: InferenceService
metadata:
  name: lab-policy-bestfit
  namespace: ome-lab
spec:
  acceleratorSelector:
    policy: BestFit
  model:
    name: qwen3-0-6b
  runtime:
    name: lab-sglang-gb200-candidates
  engine:
    minReplicas: 1
    maxReplicas: 1
```

## Anything else we need to know?

Using `BestFit` with explicit constraints works:

```yaml
acceleratorSelector:
  policy: BestFit
  constraints:
    minMemory: 384
```

With the above constraint, OME filters candidates correctly and selects `gb200-2gpu`.

Important detail: `AcceleratorClass.spec.capabilities.memoryGB` must be specified as a Kubernetes quantity, for example:

```yaml
memoryGB: 384Gi
```

Using:

```yaml
memoryGB: "384"
```

is interpreted as bytes and causes memory filtering to fail.

## Environment

* OME version: `0.1.5`
* Kubernetes version (use `kubectl version`): `v1.32.8`
* Cloud provider or hardware configuration: 2 GPU nodes, 4 × NVIDIA GB200 GPUs each
* OS:`Ubuntu 24.04.4 LTS`
* Runtime: SGLang image `docker.io/lmsysorg/sglang:dev-cu13`
* Model being served: `Qwen/Qwen3-0.6B`
* Install method: Helm OCI from the docs


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] `BestFit` accelerator policy without constraints causes controller panic #627

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] BestFit accelerator policy without constraints causes controller panic #627

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[BUG] `BestFit` accelerator policy without constraints causes controller panic #627