Skip to content

Make training checkpoints loadable under torch.load(weights_only=True)#802

Merged
kenko911 merged 4 commits into
materialyzeai:mainfrom
YiqingChen524:fix/weights-only-safe-checkpoint-hparams
Jun 10, 2026
Merged

Make training checkpoints loadable under torch.load(weights_only=True)#802
kenko911 merged 4 commits into
materialyzeai:mainfrom
YiqingChen524:fix/weights-only-safe-checkpoint-hparams

Conversation

@YiqingChen524

Copy link
Copy Markdown
Collaborator

ModelLightningModule / PotentialLightningModule called save_hyperparameters(ignore=["model"]), so the optimizer and scheduler objects (and, for the Potential module, the numpy element_refs array) were pickled into the checkpoint's hyper_parameters. Since torch 2.6 flipped torch.load's default to weights_only=True, resuming training via Trainer.fit(ckpt_path=...) then fails with an UnpicklingError on those globals.

Exclude optimizer/scheduler from save_hyperparameters (their state is already persisted in the checkpoint's optimizer_states / lr_schedulers, so this loses nothing) and store element_refs as a plain list instead of a numpy array (AtomRef already accepts a list). configure_optimizers is unaffected because it reads the self.optimizer / self.scheduler instance attributes, not hparams.

This PR fixes the issue reported by Chao Yang @Y-Chao.

ModelLightningModule / PotentialLightningModule called
save_hyperparameters(ignore=["model"]), so the optimizer and scheduler
*objects* (and, for the Potential module, the numpy element_refs array) were
pickled into the checkpoint's hyper_parameters. Since torch 2.6 flipped
torch.load's default to weights_only=True, resuming training via
Trainer.fit(ckpt_path=...) then fails with an UnpicklingError on those globals.

Exclude optimizer/scheduler from save_hyperparameters (their state is already
persisted in the checkpoint's optimizer_states / lr_schedulers, so this loses
nothing) and store element_refs as a plain list instead of a numpy array
(AtomRef already accepts a list). configure_optimizers is unaffected because it
reads the self.optimizer / self.scheduler instance attributes, not hparams.

Adds a regression test asserting the objects are absent from hparams,
element_refs is a list, and a checkpoint built from these hparams loads under
weights_only=True.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kenko911 kenko911 merged commit 9e4573e into materialyzeai:main Jun 10, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants