Hi. I just run the run.sh file and it simply started fitting the model. Once the fitting finished the error ModuleNotFoundError: No module named 'torch.distributed.device_mesh' caused. There is some more details about it.
The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
from pkg_resources import packaging # type: ignore[attr-defined]
2025-08-27 22:49:39,503| INFO| Info:
```{ 'dataset': { 'aug_jitter': False,
'data_root': '../data/dataset.h5',
'img_size': 1024,
'num_samples': 20480,
'white_bg': False},
'embedder': {'color_freq': 0, 'shape_freq': 0},
'global': { 'config': 'recon/config.yaml',
'exp_name': 'train-recon',
'log_level': 20,
'resume': 'checkpoints/recon_model.pth',
'save_root': '../checkpoints',
'seed': 2434},
'losses': { 'lambda_2D': 1.0,
'lambda_nrm': 1.0,
'lambda_rgb': 1.0,
'lambda_sdf': 10.0,
'use_mask': False,
'use_pred_nrm': False},
'network': { 'activation': 'lrelu',
'feat_dim': 16,
'hidden_dim': 512,
'layer_type': 'none',
'num_layers': 5,
'pos_dim': 8,
'skip': [2, 3, 4]},
'optimizer': { 'beta1': 0.5,
'beta2': 0.999,
'lr_decoder': 0.001,
'lr_encoder': 0.0001,
'weight_decay': 0.0},
'optional arguments': { 'help': None,
'save_uv': False,
'test_folder': 'data/examples'},
'positional arguments': {},
'scheduler': { 'lr_num_cycles': 1,
'lr_power': 1.0,
'lr_scheduler': 'constant_with_warmup',
'lr_warmup_steps': 500,
'max_grad_norm': 1.0},
'train': { 'batch_size': 8,
'epochs': 5000,
'log_every': 500,
'save_every': 50,
'workers': 8},
'validation': { 'erode_iter': 0,
'grid_size': 512,
'num_valid_samples': 5,
'subdivide': True,
'valid': False,
'valid_every': 50,
'valid_folder': '../data/examples'},
'wandb': {'wandb': False, 'wandb_id': None, 'wandb_name': 'train-recon'}}
Hi. I just run the
run.shfile and it simply started fitting the model. Once the fitting finished the errorModuleNotFoundError: No module named 'torch.distributed.device_mesh'caused. There is some more details about it.How can I fix this?