Skip to content

Error while running bash run.sh after model fitted. #31

Description

@Taha90410

Hi. I just run the run.sh file and it simply started fitting the model. Once the fitting finished the error ModuleNotFoundError: No module named 'torch.distributed.device_mesh' caused. There is some more details about it.

The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import packaging  # type: ignore[attr-defined]
2025-08-27 22:49:39,503|    INFO| Info: 
```{ 'dataset': { 'aug_jitter': False,
               'data_root': '../data/dataset.h5',
               'img_size': 1024,
               'num_samples': 20480,
               'white_bg': False},
  'embedder': {'color_freq': 0, 'shape_freq': 0},
  'global': { 'config': 'recon/config.yaml',
              'exp_name': 'train-recon',
              'log_level': 20,
              'resume': 'checkpoints/recon_model.pth',
              'save_root': '../checkpoints',
              'seed': 2434},
  'losses': { 'lambda_2D': 1.0,
              'lambda_nrm': 1.0,
              'lambda_rgb': 1.0,
              'lambda_sdf': 10.0,
              'use_mask': False,
              'use_pred_nrm': False},
  'network': { 'activation': 'lrelu',
               'feat_dim': 16,
               'hidden_dim': 512,
               'layer_type': 'none',
               'num_layers': 5,
               'pos_dim': 8,
               'skip': [2, 3, 4]},
  'optimizer': { 'beta1': 0.5,
                 'beta2': 0.999,
                 'lr_decoder': 0.001,
                 'lr_encoder': 0.0001,
                 'weight_decay': 0.0},
  'optional arguments': { 'help': None,
                          'save_uv': False,
                          'test_folder': 'data/examples'},
  'positional arguments': {},
  'scheduler': { 'lr_num_cycles': 1,
                 'lr_power': 1.0,
                 'lr_scheduler': 'constant_with_warmup',
                 'lr_warmup_steps': 500,
                 'max_grad_norm': 1.0},
  'train': { 'batch_size': 8,
             'epochs': 5000,
             'log_every': 500,
             'save_every': 50,
             'workers': 8},
  'validation': { 'erode_iter': 0,
                  'grid_size': 512,
                  'num_valid_samples': 5,
                  'subdivide': True,
                  'valid': False,
                  'valid_every': 50,
                  'valid_folder': '../data/examples'},
  'wandb': {'wandb': False, 'wandb_id': None, 'wandb_name': 'train-recon'}}

How can I fix this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions