Request for an Example to Reproduce the Paper Code with Environment and Hardware Details

Hello,

I'm attempting to run and reproduce the results of the code provided in this repository, specifically the implementation related to [PowerSGD](https://github.com/epfml/powersgd). To ensure a smooth and accurate reproduction, could you please provide a detailed example or guide that includes the following information?

1. **Hardware Platform Details**:
   - Number and type of GPUs used.
   - Any specific hardware requirements or configurations.

2. **Software Environment**:
   - The version of PyTorch and CUDA.
   - Any other dependencies or libraries required to run the code.

3. **Execution Instructions**:
   - Detailed commands to launch the training process, especially if it involves distributed training using `python -m torch.distributed.launch` or `tochrun`.

Additionally, it appears that certain parts of the code might require adjustments to work correctly with the distributed launch utility. Specifically, the references to the `rank` variable in the following lines seem to be problematic when launching with `python -m torch.distributed.launch`:

- [train.py#L74](https://github.com/epfml/powersgd/blob/master/paper-code/train.py#L74)
- [train.py#L90-L91](https://github.com/epfml/powersgd/blob/master/paper-code/train.py#L90-L91)
- [train_pytorch.py#L76](https://github.com/epfml/powersgd/blob/master/paper-code/train_pytorch.py#L76)
- [train_pytorch.py#L92-L93](https://github.com/epfml/powersgd/blob/master/paper-code/train_pytorch.py#L92-L93)

These lines of code seem to directly access a `rank` variable which may not be properly initialized in a distributed training context initiated by `torch.distributed.launch`.

Could you please clarify these aspects or suggest any necessary modifications to successfully run the distributed training as intended?

Thank you very much for your assistance and for sharing your work. I'm looking forward to successfully reproducing the results and exploring the capabilities of PowerSGD.

Best regards,
Lichen


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Request for an Example to Reproduce the Paper Code with Environment and Hardware Details #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Request for an Example to Reproduce the Paper Code with Environment and Hardware Details #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions