feat: add JoyImage edit plus by tangyanf · Pull Request #14032 · huggingface/diffusers

tangyanf · 2026-06-22T06:54:14Z

Description

We are the JoyAI Team, and this is the Diffusers implementation for the JoyAI-Image-Edit-Plus model.

GitHub Repository: [https://github.com/jd-opensource/JoyAI-Image]
Hugging Face Model: [https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers]
Original opensource weights: [https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus]

Model Overview

JoyAI-Image-Edit-Plus extends JoyAI-Image-Edit with multi-image editing capabilities. While JoyAI-Image-Edit operates on a single reference image, Edit-Plus accepts multiple reference
images as input and performs instruction-guided editing across them — enabling tasks such as subject composition, style transfer from multiple sources, and multi-view consistent editing.

It combines an 8B Multimodal Large Language Model (MLLM) with a 16B Multimodal Diffusion Transformer (MMDiT), supporting variable-resolution reference images that are independently
encoded and jointly denoised.

Key Features

Multi-Image Input: Accepts multiple reference images with different resolutions, enabling complex editing scenarios that require information from multiple visual sources.
Subject Composition: Combine elements from separate images into a coherent output guided by text instructions (e.g., "Let the person lovingly play with the dog" given separate person
and dog images).
Cross-Image Style Transfer: Apply style or attributes from one reference image to subjects in another.
Variable Resolution Support: Each reference image is independently resized and encoded at its optimal resolution, preserving fine-grained details regardless of input size.
Instruction-Guided Generation: Natural language prompts control how multiple reference images are composed and edited in the final output.

github-actions · 2026-06-22T09:30:53Z

Hi @tangyanf, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. Fixes #1234) to the PR description so the issue is linked. See the contribution guide for more details. If this PR intentionally does not fix a tracked issue, a maintainer can add the no-issue-needed label to silence this reminder.

sergereview

🤗 Serge says:

This PR adds the JoyImage Edit Plus model and pipeline. There are several blocking issues that need to be addressed before merging.

Blocking — Debug artifacts left in production code

Multiple torch.save() calls, a print() statement, and a commented-out exit(0) are left in pipeline_joyimage_edit_plus.py. These will write files to the user's working directory and print to stdout during every inference call.

Blocking — `einops` dependency

Per .ai/models.md: "No new mandatory dependency without discussion (e.g. einops). Optional deps guarded with is_X_available() and a dummy in utils/dummy_*.py." The pipeline directly imports from einops import rearrange — this is the only non-comment usage of einops in src/diffusers/. The rearrange calls should be rewritten with native PyTorch (reshape, permute, unflatten).

Blocking — `sglang` integration code in model forward

The transformer's forward method contains sglang-specific code: list-unwrapping for "SglangXvideo CFG branches" (lines 272-276) and a try: from sglang... fallback (lines 279-287). Per .ai/AGENTS.md: "No defensive code, unused code paths, or legacy stubs — do not add fallback paths, safety checks, or configuration options 'just in case'." This code doesn't belong in the diffusers model — the pipeline always passes the required arguments.

Blocking — Missing dummy objects

JoyImageEditPlusTransformer3DModel, JoyImageEditPlusPipeline, and JoyImageEditPlusPipelineOutput are not registered in dummy_pt_objects.py / dummy_torch_and_transformers_objects.py. This will cause ImportError when torch/transformers are not installed.

Blocking — Missing tests

No test files were added for the new model or pipeline.

Blocking — Hardcoded `device_type="cuda"` in `torch.autocast`

torch.autocast(device_type="cuda", ...) is hardcoded in two places in the pipeline. This will fail on MPS, XPU, and other non-CUDA devices.

Non-blocking — Inlined scheduler sigma math

Per .ai/pipelines.md gotcha #3, the pipeline manually computes shifted sigmas and temporarily overrides self.scheduler.shift — this is exactly what FlowMatchEulerDiscreteScheduler does with its shift config. The scheduler should own this logic.

Non-blocking — Unused imports and parameters

import inspect in transformer_joyimage_edit_plus.py is unused.
enable_denormalization parameter is declared in prepare_latents and __call__ but never read.
retrieve_timesteps is duplicated from the existing pipeline without a # Copied from annotation.

serge v0.1.0 · model: claude-opus-4-6 · 29 LLM turns · 50 tool calls · 190.2s · 1602502 in / 7369 out tokens

sergereview · 2026-06-22T19:31:04Z

+                max_sequence_length=max_sequence_length,
+            )
+
+        torch.save(prompt_embeds, "prompt_embeds.pt")


Debug artifact. torch.save(prompt_embeds, "prompt_embeds.pt") will write a file to the user's working directory on every inference call. Remove this and the other torch.save calls (lines 550, 582, 583).

sergereview · 2026-06-22T19:31:04Z

+        torch.save(prompt_embeds, "prompt_embeds.pt")
+        # Encode negative prompt for CFG
+        if self.do_classifier_free_guidance:
+            print(f"negative_prompt: {negative_prompt}")


Debug artifact. print(f"negative_prompt: {negative_prompt}") — remove this debug print statement.

sergereview · 2026-06-22T19:31:04Z

+        )
+        torch.save(padded_latents, "padded_latents.pt")
+        torch.save(target_mask, "target_mask.pt")
+        # exit(0)


Debug artifact. Remove the commented-out # exit(0).

sergereview · 2026-06-22T19:31:04Z

+
+import numpy as np
+import torch
+from einops import rearrange


Forbidden dependency. Per .ai/models.md: "No new mandatory dependency without discussion (e.g. einops)." This is the only real einops import in src/diffusers/. Rewrite the two rearrange calls (lines 339, 662-665) with native PyTorch.

For example, line 339:

# einops: rearrange(item, "c (t pt) (h ph) (w pw) -> (t h w) c pt ph pw", pt=pt, ph=ph, pw=pw) patches = item.unflatten(1, (t//pt, pt)).unflatten(3, (h//ph, ph)).unflatten(5, (w//pw, pw)) patches = patches.permute(1, 3, 5, 0, 2, 4, 6).reshape(-1, c, pt, ph, pw)

sergereview · 2026-06-22T19:31:04Z

+        batch_size, max_num_patches, channels, pt, ph, pw = hidden_states.shape
+        device = hidden_states.device
+
+        # Unwrap list inputs (SglangXvideo passes these as lists from CFG branches)


Defensive / framework-specific code. Per .ai/AGENTS.md: "No defensive code, unused code paths, or legacy stubs." These list-unwrapping guards for "SglangXvideo" don't belong in the diffusers model. The pipeline always passes tensors. Remove lines 272-276.

Suggested change

# Unwrap list inputs (SglangXvideo passes these as lists from CFG branches)

sergereview · 2026-06-22T19:31:04Z

+                    ref_tensor = torch.from_numpy(np.array(ref_img_pil.convert("RGB"))).to(device=device, dtype=dtype)
+                    ref_tensor = (ref_tensor / 127.5 - 1.0).permute(2, 0, 1).unsqueeze(1).unsqueeze(0)
+
+                    with torch.autocast(device_type="cuda", dtype=torch.float32):


Hardcoded CUDA device type. torch.autocast(device_type="cuda", ...) will fail on non-CUDA devices (MPS, XPU, etc.). Use the device from the tensor:

Suggested change

with torch.autocast(device_type="cuda", dtype=torch.float32):

with torch.autocast(device_type=device.type, dtype=torch.float32):

Same issue on line 670.

sergereview · 2026-06-22T19:31:05Z

+        device: torch.device,
+        generator: Optional[Union[torch.Generator, List[torch.Generator]]],
+        reference_images: Optional[List[List[Image.Image]]] = None,
+        enable_denormalization: bool = True,


Unused parameter. enable_denormalization is declared but never read inside prepare_latents. Either use it or remove it from both prepare_latents and __call__.

sergereview · 2026-06-22T19:31:05Z

+
+        # Prepare timesteps — compute sigmas with single shift to match original scheduler
+        if timesteps is None and sigmas is None:
+            shift = getattr(self.scheduler.config, "shift", 1.0)


Inlined scheduler math. Per .ai/pipelines.md gotcha #3, this manually computes shifted sigmas and temporarily overrides self.scheduler.shift — this is exactly what FlowMatchEulerDiscreteScheduler does with its shift config. Consider letting the scheduler own this:

self.scheduler.set_timesteps(num_inference_steps, device=device) timesteps = self.scheduler.timesteps

sergereview · 2026-06-22T19:31:05Z

+    Output class for JoyImage Edit Plus multi-image editing pipelines.
+    """
+
+    images: Union[List[PIL.Image.Image], np.ndarray]


Missing newline at end of file.

Suggested change

images: Union[List[PIL.Image.Image], np.ndarray]

images: Union[List[PIL.Image.Image], np.ndarray]

sergereview · 2026-06-22T19:31:05Z

+                    self._pad_sequence(negative_prompt_embeds_mask, max_seq_len),
+                    self._pad_sequence(prompt_embeds_mask, max_seq_len),
+                ])
+        torch.save(prompt_embeds, 'prompt_embeds_2.pt')


Debug artifact. Remove torch.save(prompt_embeds, 'prompt_embeds_2.pt').

feat: add image edit plus

59e4801

github-actions Bot added models pipelines size/L PR with diff > 200 LOC labels Jun 22, 2026

yiyixuxu added the no-issue-needed for PRs that do not require link to an issue label Jun 22, 2026

sergereview Bot requested changes Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add JoyImage edit plus#14032

feat: add JoyImage edit plus#14032
tangyanf wants to merge 1 commit into
huggingface:mainfrom
tangyanf:add-joyimage-edit-plus

tangyanf commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

sergereview Bot left a comment

Uh oh!

sergereview Bot Jun 22, 2026

Uh oh!

sergereview Bot Jun 22, 2026

Uh oh!

sergereview Bot Jun 22, 2026

Uh oh!

sergereview Bot Jun 22, 2026

Uh oh!

sergereview Bot Jun 22, 2026

Uh oh!

sergereview Bot Jun 22, 2026

Uh oh!

sergereview Bot Jun 22, 2026

Uh oh!

sergereview Bot Jun 22, 2026

Uh oh!

sergereview Bot Jun 22, 2026

Uh oh!

sergereview Bot Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	with torch.autocast(device_type="cuda", dtype=torch.float32):
	with torch.autocast(device_type=device.type, dtype=torch.float32):

	images: Union[List[PIL.Image.Image], np.ndarray]
	images: Union[List[PIL.Image.Image], np.ndarray]

Conversation

tangyanf commented Jun 22, 2026

Description

Model Overview

Key Features

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

sergereview Bot left a comment

Choose a reason for hiding this comment

Blocking — Debug artifacts left in production code

Blocking — einops dependency

Blocking — sglang integration code in model forward

Blocking — Missing dummy objects

Blocking — Missing tests

Blocking — Hardcoded device_type="cuda" in torch.autocast

Non-blocking — Inlined scheduler sigma math

Non-blocking — Unused imports and parameters

Uh oh!

sergereview Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

sergereview Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

sergereview Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

sergereview Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

sergereview Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

sergereview Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

sergereview Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

sergereview Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

sergereview Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

sergereview Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Blocking — `einops` dependency

Blocking — `sglang` integration code in model forward

Blocking — Hardcoded `device_type="cuda"` in `torch.autocast`