[Neuron] Add tensor parallel support for Neuron backend#13718
[Neuron] Add tensor parallel support for Neuron backend#13718JingyaHuang wants to merge 29 commits into
Conversation
… into add-neuron-backend
… into add-neuron-backend
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks for starting this!
I think it'd be simpler to keep the changes limited to a single model and a pipeline for iterating more quickly.
Additionally, a few thoughts:
I don't think we're exposing the TP config in modeling_utils.py. I think the enable_parallelism() method accept it:
diffusers/src/diffusers/models/modeling_utils.py
Line 1585 in 86dab15
(and include all the necessary validation)
And then instead of manually iterating on transformer_blocks and single_transformer_blocks, we could try to configure that through class-level attributes, e.g., _tp_blocks or something like that.
After these changes, we could perhaps work on presenting some numbers where TP is beneficial, etc. WDYT?
| lambda image_url_or_path: load_image(image_url_or_path) | ||
| if urlparse(image_url_or_path).scheme | ||
| else Image.open(image_url_or_path).convert("RGB") | ||
| lambda image_url_or_path: ( | ||
| load_image(image_url_or_path) | ||
| if urlparse(image_url_or_path).scheme | ||
| else Image.open(image_url_or_path).convert("RGB") | ||
| ) |
There was a problem hiding this comment.
Seems like an unrelated change?
| A ``Flux2Transformer2DModel`` instance. Must have ``transformer_blocks`` | ||
| and ``single_transformer_blocks`` attributes. |
There was a problem hiding this comment.
It cannot be specific to a particular model-type, right?
| tp_mesh = config._mesh | ||
| if tp_mesh is None: | ||
| raise ValueError( | ||
| "`config._mesh` is None. Call `config.setup(rank, world_size, device)` before applying TP." | ||
| ) | ||
|
|
||
| for block in model.transformer_blocks: | ||
| parallelize_module(block, tp_mesh, double_block_plan) | ||
|
|
||
| for block in model.single_transformer_blocks: | ||
| parallelize_module(block, tp_mesh, single_block_plan) |
There was a problem hiding this comment.
Can we make it similar to
| latent_ids = latent_ids.unsqueeze(0).expand(batch_size, -1, -1) | ||
|
|
||
| return latent_ids | ||
| return latent_ids.float() |
There was a problem hiding this comment.
👀
This doesn't seem like a related change?
| A custom device mesh to use. If provided, ``tp_degree`` is inferred from | ||
| ``mesh.size()`` and the argument is ignored. Useful when combining TP with | ||
| other parallelism strategies (e.g. CP) that share the same mesh. |
There was a problem hiding this comment.
Could you provide an example for this?
| if self.context_parallel_config is not None: | ||
| self.context_parallel_config.setup(rank, world_size, device, mesh) | ||
| if self.tensor_parallel_config is not None: | ||
| self.tensor_parallel_config.setup(rank, world_size, device, mesh) |
There was a problem hiding this comment.
Let's raise if both context_parallel_config and tensor_parallel_config are specified?
| if self.tp_degree < 1: | ||
| raise ValueError("`tp_degree` must be >= 1.") | ||
|
|
||
| def setup( |
There was a problem hiding this comment.
Where is this supposed to be called from?
What does this PR do?
This PR adds tensor parallel support for Neuron devices. Since TP isn't yet supported in diffusers, I followed the existing sequence parallel pattern and introduced a
TensorParallelConfig.This is still very much a work in progress. The goal at this stage is to surface the changes needed to enable TP on Neuron, not to land a stable implementation.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.