[Neuron] Add tensor parallel support for Neuron backend by JingyaHuang · Pull Request #13718 · huggingface/diffusers

JingyaHuang · 2026-05-11T14:17:44Z

What does this PR do?

This PR adds tensor parallel support for Neuron devices. Since TP isn't yet supported in diffusers, I followed the existing sequence parallel pattern and introduced a TensorParallelConfig.
This is still very much a work in progress. The goal at this stage is to surface the changes needed to enable TP on Neuron, not to land a stable implementation.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

… into add-neuron-backend

HuggingFaceDocBuilderDev · 2026-05-11T14:30:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

Thanks for starting this!

I think it'd be simpler to keep the changes limited to a single model and a pipeline for iterating more quickly.

Additionally, a few thoughts:

I don't think we're exposing the TP config in modeling_utils.py. I think the enable_parallelism() method accept it:

diffusers/src/diffusers/models/modeling_utils.py

Line 1585 in 86dab15

def enable_parallelism(

(and include all the necessary validation)

And then instead of manually iterating on transformer_blocks and single_transformer_blocks, we could try to configure that through class-level attributes, e.g., _tp_blocks or something like that.

After these changes, we could perhaps work on presenting some numbers where TP is beneficial, etc. WDYT?

sayakpaul · 2026-06-08T09:48:25Z

-        lambda image_url_or_path: load_image(image_url_or_path)
-        if urlparse(image_url_or_path).scheme
-        else Image.open(image_url_or_path).convert("RGB")
+        lambda image_url_or_path: (
+            load_image(image_url_or_path)
+            if urlparse(image_url_or_path).scheme
+            else Image.open(image_url_or_path).convert("RGB")
+        )


Seems like an unrelated change?

sayakpaul · 2026-06-08T09:50:25Z

+            A ``Flux2Transformer2DModel`` instance. Must have ``transformer_blocks``
+            and ``single_transformer_blocks`` attributes.


It cannot be specific to a particular model-type, right?

sayakpaul · 2026-06-08T09:53:35Z

+    tp_mesh = config._mesh
+    if tp_mesh is None:
+        raise ValueError(
+            "`config._mesh` is None. Call `config.setup(rank, world_size, device)` before applying TP."
+        )
+
+    for block in model.transformer_blocks:
+        parallelize_module(block, tp_mesh, double_block_plan)
+
+    for block in model.single_transformer_blocks:
+        parallelize_module(block, tp_mesh, single_block_plan)


Can we make it similar to

diffusers/src/diffusers/hooks/context_parallel.py

Line 80 in f3d42be

def apply_context_parallel(

?

sayakpaul · 2026-06-08T09:54:41Z

        latent_ids = latent_ids.unsqueeze(0).expand(batch_size, -1, -1)

-        return latent_ids
+        return latent_ids.float()


👀

This doesn't seem like a related change?

sayakpaul · 2026-06-08T09:55:59Z

+            A custom device mesh to use. If provided, ``tp_degree`` is inferred from
+            ``mesh.size()`` and the argument is ignored. Useful when combining TP with
+            other parallelism strategies (e.g. CP) that share the same mesh.


Could you provide an example for this?

sayakpaul · 2026-06-08T09:56:44Z

        if self.context_parallel_config is not None:
            self.context_parallel_config.setup(rank, world_size, device, mesh)
+        if self.tensor_parallel_config is not None:
+            self.tensor_parallel_config.setup(rank, world_size, device, mesh)


Let's raise if both context_parallel_config and tensor_parallel_config are specified?

sayakpaul · 2026-06-08T10:04:07Z

+        if self.tp_degree < 1:
+            raise ValueError("`tp_degree` must be >= 1.")
+
+    def setup(


Where is this supposed to be called from?

JingyaHuang and others added 25 commits March 18, 2026 11:15

draft:add neuron as a legit backend

98f6c8c

Merge branch 'huggingface:main' into add-neuron-backend

c58b8b8

Merge branch 'huggingface:main' into add-neuron-backend

3367409

Merge branch 'main' into add-neuron-backend

0c51734

feat: neuron-specific changes in the pipeline

a76953c

tests: eager tests

2480388

draft: start with tp for flux2

1469c04

fix: style

929ab72

Merge branch 'huggingface:main' into add-neuron-backend

52cac76

Merge branch 'huggingface:main' into support-neuron-tp

30cb353

Merge branch 'add-neuron-backend' of github.com:JingyaHuang/diffusers…

28a5086

… into add-neuron-backend

Merge branch 'huggingface:main' into support-neuron-tp

7fab0c4

Merge branch 'huggingface:main' into add-neuron-backend

68689e5

Merge branch 'main' into add-neuron-backend

da79308

fix:apr_02 beta

3bb9c7c

Merge branch 'add-neuron-backend' of github.com:JingyaHuang/diffusers…

c4facab

… into add-neuron-backend

feat:add wan

dff1f32

Merge branch 'huggingface:main' into support-neuron-tp

1c930c4

Merge branch 'huggingface:main' into add-neuron-backend

1eb5ff9

fix:pixart

cbe8f28

fix: rewrite flux swiglu activation to avoid gather op in neuron IR

16b9606

test: pixart compile mode on neuron

7f13f68

Merge branch 'main' into neuron-torch-comppile

a46cb19

cleanup & fix style

a354b88

Merge branch 'neuron-torch-comppile' into support-neuron-tp

931bb85

github-actions Bot added size/L PR with diff > 200 LOC lora models tests utils labels May 11, 2026

github-actions Bot added pipelines examples hooks labels May 11, 2026

Merge branch 'main' into support-neuron-tp

9ab6dc3

sayakpaul reviewed Jun 8, 2026

View reviewed changes

Merge branch 'main' into support-neuron-tp

48fb75b

github-actions Bot removed the utils label Jun 22, 2026

JingyaHuang and others added 2 commits June 22, 2026 12:00

merge: another change

c350f7b

Merge branch 'main' into support-neuron-tp

644477a

github-actions Bot removed the tests label Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Neuron] Add tensor parallel support for Neuron backend#13718

[Neuron] Add tensor parallel support for Neuron backend#13718
JingyaHuang wants to merge 29 commits into
huggingface:mainfrom
JingyaHuang:support-neuron-tp

JingyaHuang commented May 11, 2026

Uh oh!

HuggingFaceDocBuilderDev commented May 11, 2026

Uh oh!

sayakpaul left a comment

Uh oh!

sayakpaul Jun 8, 2026

Uh oh!

sayakpaul Jun 8, 2026

Uh oh!

sayakpaul Jun 8, 2026

Uh oh!

Uh oh!

sayakpaul Jun 8, 2026

Uh oh!

sayakpaul Jun 8, 2026

Uh oh!

sayakpaul Jun 8, 2026

Uh oh!

sayakpaul Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		A ``Flux2Transformer2DModel`` instance. Must have ``transformer_blocks``
		and ``single_transformer_blocks`` attributes.

Conversation

JingyaHuang commented May 11, 2026

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented May 11, 2026

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants