Run test_dtensor with torchrun #2526

kshitij12345 · 2025-09-24T11:01:48Z

Context:
When running the test_dtensor.py like we do now (pytest thunder/tests/distributed/test_dtensor.py), we see Error from segmentation group 1: The singleton Communicator isn't available. for #2503

Changes:

Move all the tests under class DTensorTest(DistributedParallelTestCase) to free functions with test prefix and run the test with torchrun --nproc-per-node 2 --no-python pytest thunder/tests/distributed/test_dtensor.py (This is required to correctly run the nvFuser tests)

cc @Borda

wujingyue

LGTM, but can you clarify the context? What makes it hard for nvFuser DTensor tests to work with DistributedParallelTestCase?

kshitij12345 · 2025-09-26T16:41:25Z

but can you clarify the context?

(Have also added this to PR description)
When running the test_dtensor.py like we do now (pytest thunder/tests/distributed/test_dtensor.py), we see Error from segmentation group 1: The singleton Communicator isn't available. for #2503

What makes it hard for nvFuser DTensor tests to work with DistributedParallelTestCase?

I was seeing hang-up in some tests when DistributedParallelTestCase is used with torchrun --nproc-per-node 2 --no-python pytest thunder/tests/distributed/test_dtensor.py. Haven't found the exact cause for the same.

KaelanDt

Thank you @kshitij12345

…g-AI/lightning-thunder into update-dtensor-test-run

crcrpar · 2025-09-29T11:45:13Z

I was seeing hang-up in some tests when DistributedParallelTestCase is used with torchrun --nproc-per-node 2 --no-python pytest thunder/tests/distributed/test_dtensor.py. Haven't found the exact cause for the same.

we can just run DistributedParallelTestCase-based tests with pytest and I think it's the way to run such tests. PyTorch distributed elastic launch command is not needed. (I know you have a problem with pytest and dtensor test)

run test_dtensor with torchrun

961f24e

github-actions bot added the ci label Sep 24, 2025

update module level skip

7cf40ee

kshitij12345 force-pushed the update-dtensor-test-run branch from 8d40b21 to 7cf40ee Compare September 24, 2025 11:31

Merge branch 'main' into update-dtensor-test-run

4e3d9c9

kshitij12345 changed the title ~~[WIP] run test_dtensor with torchrun~~ Run test_dtensor with torchrun Sep 26, 2025

kshitij12345 requested review from wujingyue and rdspring1 September 26, 2025 14:16

kshitij12345 added the DTensor Issues about DTensor support in Thunder label Sep 26, 2025

kshitij12345 self-assigned this Sep 26, 2025

wujingyue approved these changes Sep 26, 2025

View reviewed changes

kshitij12345 marked this pull request as ready for review September 26, 2025 17:13

kshitij12345 requested review from KaelanDt, lantiga, t-vi and mruberry as code owners September 26, 2025 17:13

KaelanDt approved these changes Sep 27, 2025

View reviewed changes

Merge branch 'main' into update-dtensor-test-run

a149c1e

kshitij12345 enabled auto-merge (squash) September 29, 2025 07:49

kshitij12345 added 2 commits September 29, 2025 02:52

Add a session fixture to call destroy_process_group

b131e80

Merge branch 'update-dtensor-test-run' of https://github.com/Lightnin…

a55631c

…g-AI/lightning-thunder into update-dtensor-test-run

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run test_dtensor with torchrun #2526

Run test_dtensor with torchrun #2526

Uh oh!

kshitij12345 commented Sep 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

wujingyue left a comment

Uh oh!

kshitij12345 commented Sep 26, 2025

Uh oh!

KaelanDt left a comment

Uh oh!

crcrpar commented Sep 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Run test_dtensor with torchrun #2526

Are you sure you want to change the base?

Run test_dtensor with torchrun #2526

Uh oh!

Conversation

kshitij12345 commented Sep 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wujingyue left a comment

Choose a reason for hiding this comment

Uh oh!

kshitij12345 commented Sep 26, 2025

Uh oh!

KaelanDt left a comment

Choose a reason for hiding this comment

Uh oh!

crcrpar commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kshitij12345 commented Sep 24, 2025 •

edited by github-actions bot

Loading

crcrpar commented Sep 29, 2025 •

edited

Loading