Skip to content

Driver upgrade documentation is misleading when on OpenShift #2006

@empovit

Description

@empovit

Problem

The GPU Driver Upgrades documentation states:

Upgrade the driver by changing the driver.version value in the cluster policy

This works on Kubernetes (Helm) but fails on OpenShift (OLM).

Behavior

On OpenShift, change only driver.version:

spec:
  driver:
    version: "570.172.08"

The above configuration results in:

  • Invalid driver image path: /:570.172.08-rhel9.6
  • Image pull fails
  • Driver pods fail to start

Required workaround:

Provide values for all driver image properties:

spec:
  driver:
    repository: nvcr.io/nvidia
    image: driver
    version: "570.172.08"

Differences

  • Helm: Populates default repository and image values from chart into ClusterPolicy
  • OLM: ClusterPolicy has no defaults; operator relies on static CSV environment variables

Request

I need your input before suggesting a solution. It looks like either:

  1. The code must be fixed: Provide default values for OLM deployments to match Helm behavior
  2. The docs must be fixed: Document that on OpenShift all three fields (repository, image, version) are required

Environment

  • OpenShift with GPU Operator installed via OLM

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationIssue/PR focused on fixing/editing/adding documentation bitsquestionCategorizes issue or PR as a support question.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions