feat: add JoyImage edit plus#14032
Conversation
|
Hi @tangyanf, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. |
There was a problem hiding this comment.
🤗 Serge says:
This PR adds the JoyImage Edit Plus model and pipeline. There are several blocking issues that need to be addressed before merging.
Blocking — Debug artifacts left in production code
Multiple torch.save() calls, a print() statement, and a commented-out exit(0) are left in pipeline_joyimage_edit_plus.py. These will write files to the user's working directory and print to stdout during every inference call.
Blocking — einops dependency
Per .ai/models.md: "No new mandatory dependency without discussion (e.g. einops). Optional deps guarded with is_X_available() and a dummy in utils/dummy_*.py." The pipeline directly imports from einops import rearrange — this is the only non-comment usage of einops in src/diffusers/. The rearrange calls should be rewritten with native PyTorch (reshape, permute, unflatten).
Blocking — sglang integration code in model forward
The transformer's forward method contains sglang-specific code: list-unwrapping for "SglangXvideo CFG branches" (lines 272-276) and a try: from sglang... fallback (lines 279-287). Per .ai/AGENTS.md: "No defensive code, unused code paths, or legacy stubs — do not add fallback paths, safety checks, or configuration options 'just in case'." This code doesn't belong in the diffusers model — the pipeline always passes the required arguments.
Blocking — Missing dummy objects
JoyImageEditPlusTransformer3DModel, JoyImageEditPlusPipeline, and JoyImageEditPlusPipelineOutput are not registered in dummy_pt_objects.py / dummy_torch_and_transformers_objects.py. This will cause ImportError when torch/transformers are not installed.
Blocking — Missing tests
No test files were added for the new model or pipeline.
Blocking — Hardcoded device_type="cuda" in torch.autocast
torch.autocast(device_type="cuda", ...) is hardcoded in two places in the pipeline. This will fail on MPS, XPU, and other non-CUDA devices.
Non-blocking — Inlined scheduler sigma math
Per .ai/pipelines.md gotcha #3, the pipeline manually computes shifted sigmas and temporarily overrides self.scheduler.shift — this is exactly what FlowMatchEulerDiscreteScheduler does with its shift config. The scheduler should own this logic.
Non-blocking — Unused imports and parameters
import inspectintransformer_joyimage_edit_plus.pyis unused.enable_denormalizationparameter is declared inprepare_latentsand__call__but never read.retrieve_timestepsis duplicated from the existing pipeline without a# Copied fromannotation.
serge v0.1.0 · model: claude-opus-4-6 · 29 LLM turns · 50 tool calls · 190.2s · 1602502 in / 7369 out tokens
| max_sequence_length=max_sequence_length, | ||
| ) | ||
|
|
||
| torch.save(prompt_embeds, "prompt_embeds.pt") |
There was a problem hiding this comment.
Debug artifact. torch.save(prompt_embeds, "prompt_embeds.pt") will write a file to the user's working directory on every inference call. Remove this and the other torch.save calls (lines 550, 582, 583).
| torch.save(prompt_embeds, "prompt_embeds.pt") | ||
| # Encode negative prompt for CFG | ||
| if self.do_classifier_free_guidance: | ||
| print(f"negative_prompt: {negative_prompt}") |
There was a problem hiding this comment.
Debug artifact. print(f"negative_prompt: {negative_prompt}") — remove this debug print statement.
| ) | ||
| torch.save(padded_latents, "padded_latents.pt") | ||
| torch.save(target_mask, "target_mask.pt") | ||
| # exit(0) |
There was a problem hiding this comment.
Debug artifact. Remove the commented-out # exit(0).
|
|
||
| import numpy as np | ||
| import torch | ||
| from einops import rearrange |
There was a problem hiding this comment.
Forbidden dependency. Per .ai/models.md: "No new mandatory dependency without discussion (e.g. einops)." This is the only real einops import in src/diffusers/. Rewrite the two rearrange calls (lines 339, 662-665) with native PyTorch.
For example, line 339:
# einops: rearrange(item, "c (t pt) (h ph) (w pw) -> (t h w) c pt ph pw", pt=pt, ph=ph, pw=pw)
patches = item.unflatten(1, (t//pt, pt)).unflatten(3, (h//ph, ph)).unflatten(5, (w//pw, pw))
patches = patches.permute(1, 3, 5, 0, 2, 4, 6).reshape(-1, c, pt, ph, pw)| batch_size, max_num_patches, channels, pt, ph, pw = hidden_states.shape | ||
| device = hidden_states.device | ||
|
|
||
| # Unwrap list inputs (SglangXvideo passes these as lists from CFG branches) |
There was a problem hiding this comment.
Defensive / framework-specific code. Per .ai/AGENTS.md: "No defensive code, unused code paths, or legacy stubs." These list-unwrapping guards for "SglangXvideo" don't belong in the diffusers model. The pipeline always passes tensors. Remove lines 272-276.
| # Unwrap list inputs (SglangXvideo passes these as lists from CFG branches) |
| ref_tensor = torch.from_numpy(np.array(ref_img_pil.convert("RGB"))).to(device=device, dtype=dtype) | ||
| ref_tensor = (ref_tensor / 127.5 - 1.0).permute(2, 0, 1).unsqueeze(1).unsqueeze(0) | ||
|
|
||
| with torch.autocast(device_type="cuda", dtype=torch.float32): |
There was a problem hiding this comment.
Hardcoded CUDA device type. torch.autocast(device_type="cuda", ...) will fail on non-CUDA devices (MPS, XPU, etc.). Use the device from the tensor:
| with torch.autocast(device_type="cuda", dtype=torch.float32): | |
| with torch.autocast(device_type=device.type, dtype=torch.float32): |
Same issue on line 670.
| device: torch.device, | ||
| generator: Optional[Union[torch.Generator, List[torch.Generator]]], | ||
| reference_images: Optional[List[List[Image.Image]]] = None, | ||
| enable_denormalization: bool = True, |
There was a problem hiding this comment.
Unused parameter. enable_denormalization is declared but never read inside prepare_latents. Either use it or remove it from both prepare_latents and __call__.
|
|
||
| # Prepare timesteps — compute sigmas with single shift to match original scheduler | ||
| if timesteps is None and sigmas is None: | ||
| shift = getattr(self.scheduler.config, "shift", 1.0) |
There was a problem hiding this comment.
Inlined scheduler math. Per .ai/pipelines.md gotcha #3, this manually computes shifted sigmas and temporarily overrides self.scheduler.shift — this is exactly what FlowMatchEulerDiscreteScheduler does with its shift config. Consider letting the scheduler own this:
self.scheduler.set_timesteps(num_inference_steps, device=device)
timesteps = self.scheduler.timesteps| Output class for JoyImage Edit Plus multi-image editing pipelines. | ||
| """ | ||
|
|
||
| images: Union[List[PIL.Image.Image], np.ndarray] No newline at end of file |
There was a problem hiding this comment.
Missing newline at end of file.
| images: Union[List[PIL.Image.Image], np.ndarray] | |
| images: Union[List[PIL.Image.Image], np.ndarray] |
| self._pad_sequence(negative_prompt_embeds_mask, max_seq_len), | ||
| self._pad_sequence(prompt_embeds_mask, max_seq_len), | ||
| ]) | ||
| torch.save(prompt_embeds, 'prompt_embeds_2.pt') |
There was a problem hiding this comment.
Debug artifact. Remove torch.save(prompt_embeds, 'prompt_embeds_2.pt').
Description
We are the JoyAI Team, and this is the Diffusers implementation for the JoyAI-Image-Edit-Plus model.
GitHub Repository: [https://github.com/jd-opensource/JoyAI-Image]
Hugging Face Model: [https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers]
Original opensource weights: [https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus]
Model Overview
JoyAI-Image-Edit-Plus extends JoyAI-Image-Edit with multi-image editing capabilities. While JoyAI-Image-Edit operates on a single reference image, Edit-Plus accepts multiple reference
images as input and performs instruction-guided editing across them — enabling tasks such as subject composition, style transfer from multiple sources, and multi-view consistent editing.
It combines an 8B Multimodal Large Language Model (MLLM) with a 16B Multimodal Diffusion Transformer (MMDiT), supporting variable-resolution reference images that are independently
encoded and jointly denoised.
Key Features
and dog images).