Skip to content

[discrete diffusion] Add DiffusionGemma pipeline and schedulers#13986

Open
kashif wants to merge 20 commits into
huggingface:mainfrom
kashif:diffusion-gemma-schedulers
Open

[discrete diffusion] Add DiffusionGemma pipeline and schedulers#13986
kashif wants to merge 20 commits into
huggingface:mainfrom
kashif:diffusion-gemma-schedulers

Conversation

@kashif

@kashif kashif commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Adds a DiffusionGemma block-diffusion pipeline, alongside the schedulers already on this branch (discrete DDIM, entropy bound, and a uniform mode for block refinement).

DiffusionGemma is an encoder-decoder block-diffusion model: the encoder reads the prompt into a KV cache and the decoder denoises a fixed-size canvas by cross-attending to it. The pipeline runs the outer canvas loop and the inner denoising loop, sampling candidates each step, committing the most confident ones via BlockRefinementScheduler in uniform corruption mode, and renoising the rest. Structure mirrors the LLaDA2 and dflash (#13699) pipelines.

The model itself lives in transformers as DiffusionGemmaForBlockDiffusion (released in 5.12.0).

Tested:

  • pipeline unit tests pass (plumbing, callbacks, output types)
  • the pipeline drives the real tiny checkpoint end to end without error

Quality on the full google/diffusiongemma-26B-A4B-it checkpoint still needs a GPU run.

@github-actions github-actions Bot added size/L PR with diff > 200 LOC documentation Improvements or additions to documentation tests utils pipelines schedulers labels Jun 18, 2026
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! A couple questions from quick skimming

Comment thread src/diffusers/pipelines/diffusion_gemma/pipeline_diffusion_gemma.py Outdated
Comment thread src/diffusers/pipelines/diffusion_gemma/pipeline_diffusion_gemma.py Outdated
Comment thread src/diffusers/pipelines/diffusion_gemma/pipeline_diffusion_gemma.py Outdated
@kashif kashif mentioned this pull request Jun 18, 2026
6 tasks

@yiyixuxu yiyixuxu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! i left a few comments

I reviewed this through the lens of diffuser convention/style. If some of these choices are intentional to keep things familiar for Transformers users, let me know, and we can figure out the right balance together

def __call__(
self,
prompt: str | list[str] | None = None,
messages: list[dict[str, str]] | None = None,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think between prompt and messages, we only need accept prompt since it's a really cheap into messages

it's just this, no?

messages = [{"role": "user", "content": prompt}]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. The one wrinkle is image prompts, which we pass through messages today, so I'll fold the prompt/messages simplification into the image input rework so single-image and text both stay clean. Coming in a follow-up.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made prompt the primary input and dropped the tokenized intermediates. Kept messages for raw multi-turn/multimodal conversations (per the thread below with zucchini), and added a raw image arg for the simple prompt+image case, so it is all raw inputs now.

Comment thread src/diffusers/pipelines/diffusion_gemma/pipeline_diffusion_gemma.py Outdated
Comment thread src/diffusers/pipelines/diffusion_gemma/pipeline_diffusion_gemma.py Outdated
kashif and others added 4 commits June 19, 2026 09:35
Adds optional Gibbs corrector sweeps after each predictor step for
uniform diffusion, recovering the LOO denoiser in closed form so it
works on the released checkpoint with no retraining.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kashif kashif requested a review from dg845 June 20, 2026 08:16
The denoiser is a Transformers model, so adapters (LoRA, DoRA, ...) load
through its native PEFT integration rather than the diffusers LoRA loader.
Also dispatch the predictor-corrector by scheduler capability instead of class.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kashif kashif requested a review from sayakpaul June 20, 2026 10:29
Build callback_kwargs with a loop instead of a dict comprehension, whose
own scope hides locals() on pre-3.12 (PEP 709), causing KeyError: 'canvas'.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation pipelines schedulers size/L PR with diff > 200 LOC tests utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants