Skip to content

Skip OCI Object Storage download for Ready same-path artifacts#629

Open
op109lvb wants to merge 2 commits into
mainfrom
fix-copy-model
Open

Skip OCI Object Storage download for Ready same-path artifacts#629
op109lvb wants to merge 2 commits into
mainfrom
fix-copy-model

Conversation

@op109lvb

@op109lvb op109lvb commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

What this PR does

Speeds up copied model reconciliation when the copied CR uses OCI Object Storage and points to the same storage path as a model artifact that is already Ready on the same node.

For same-path object storage reuse, model-agent skips the expensive download/integrity-validation path instead of re-running file validation.

The implementation looks at this node’s Ready model entries and reuses only when the storage URI, path, schema path, storage key, parameters, and resolved destination path match exactly, and the local destination path already exists.

Why we need it

Copied model CRs can point to the same object storage path as an already Ready model on the same node. In that case, re-downloading or re-validating all files adds unnecessary reconciliation latency.

This speeds up copied model CR reconciliation while preserving safety:

  • reuse only happens for OCI Object Storage models
  • reuse only happens when the storage URI/path/params match exactly
  • reuse only happens if the local destination path exists on that node
  • nodes without a local Ready artifact continue through the normal download/validation flow
  • Hugging Face continues using its existing SHA-based artifact reuse logic

How to test

go test -count=1 ./pkg/modelagent ./pkg/controller/v1beta1/basemodel
ok  	sigs.k8s.io/ome/pkg/modelagent	1.715s
ok  	sigs.k8s.io/ome/pkg/controller/v1beta1/basemodel	0.493s

Checklist

  • Tests added/updated (if applicable)
  • Docs updated (if applicable)
  • make test passes locally

@github-actions github-actions Bot added model-agent Model agent changes tests Test changes labels Jun 12, 2026
@op109lvb op109lvb changed the title Reuse Ready same-path model artifacts to reduce model copy latency Reuse Ready same-path model artifacts to reduce BaseModel copy latency Jun 12, 2026
@op109lvb op109lvb changed the title Reuse Ready same-path model artifacts to reduce BaseModel copy latency Reuse Ready same-path model artifacts to reduce model copy latency Jun 12, 2026
@op109lvb op109lvb changed the title Reuse Ready same-path model artifacts to reduce model copy latency Skip object storage download for Ready same-path artifacts Jun 13, 2026
@op109lvb op109lvb changed the title Skip object storage download for Ready same-path artifacts Skip OCI Object Storage download for Ready same-path artifacts Jun 13, 2026
Comment thread pkg/modelagent/gopher.go
}

func sameModelStoragePath(currentStorage *v1beta1.StorageSpec, candidateStorage *v1beta1.StorageSpec, modelRootDir string, destPath string) bool {
if currentStorage == nil || candidateStorage == nil || currentStorage.Path == nil || candidateStorage.Path == nil {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to handle the case where the getDestPath is empty?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add.
getDestPath already falls back to modelRootDir + "/" + storageUri when storage.path is empty, and reuse still requires os.Stat(destPath) to pass.

Comment thread pkg/modelagent/gopher.go
baseModels, err := s.baseModelLister.List(labels.Everything())
if err == nil {
for _, model := range baseModels {
key := constants.GetModelConfigMapKey(model.Namespace, model.Name, false)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we can extract reusable part between cluster basemodel and basemodel.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I kept the BaseModel and ClusterBaseModel lookup loops separate because that matches the existing repo pattern: they are different CR types with different listers and namespace/key semantics. The shared comparison logic is already extracted into sameModelStoragePath, so I’d avoid a broader refactor in this focused PR.

Comment thread pkg/modelagent/gopher.go Outdated
Comment thread pkg/modelagent/gopher.go
@pallasathena92

Copy link
Copy Markdown
Collaborator

we have this before f8d263f
which enable HuggingFace to leverage sha for artifact reuse.
we can use it in oci object storage as well

@op109lvb

Copy link
Copy Markdown
Collaborator Author

we have this before f8d263f which enable HuggingFace to leverage sha for artifact reuse. we can use it in oci object storage as well

For the existing HF reuse path: It works because HF has a model-level commit SHA that model-agent can fetch before download and compare with the SHA recorded in the node ConfigMap. OCI Object Storage does not have the same model-level SHA in the current implementation.

To prove an existing OCI local copy matches, the current path has to do per-object size/MD5 validation, which is the expensive work this PR is trying to skip for copied CRs.
So I kept this PR scoped to OCI same-path Ready reuse: if the storage identity and resolved local destination are exactly the same and already Ready on this node, we skip downloadModel and avoid re-running per-file validation.

@op109lvb op109lvb requested a review from truddy0 June 23, 2026 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model-agent Model agent changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants