Flux2 port [in progress] by Gnav3852 · Pull Request #1133 · hao-ai-lab/FastVideo

Gnav3852 · 2026-02-26T00:48:31Z

In progress flux2 port with debug files

To run debug files:
Install diffusers from source
Have sglang at the same level as Fastvid in the pod

gemini-code-assist · 2026-02-26T00:49:47Z

Summary of Changes

Hello @Gnav3852, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the FastVideo library's capabilities by adding full support for the Flux2 image generation model, including the distilled Klein variant. It integrates the Flux2 DiT transformer, VAE, and Qwen3 text encoder, along with specialized pipeline stages tailored for Flux2's unique data formats and processing requirements. A key aspect of this PR is the inclusion of extensive comparison and debugging utilities, allowing for rigorous validation against official implementations and facilitating detailed analysis of model behavior. These additions ensure high fidelity and robust performance for Flux2 models within the FastVideo ecosystem.

Highlights

Flux2 Model Integration: Introduced comprehensive support for the Flux2 image generation model, including its Klein variant, by integrating its DiT transformer, VAE, and Qwen3 text encoder into the FastVideo framework.
Comparison and Debugging Tools: Added a suite of Python scripts for detailed comparison and debugging of Flux2 components (DiT, VAE, Text Encoder) against official Diffusers and SGLang implementations, including block-by-block and end-to-end image comparisons.
Pipeline Stage Enhancements: Implemented Flux2-specific latent and timestep preparation stages to correctly handle its unique packed latent format and resolution-dependent timestep scaling. Also improved VAE decoding logic for Flux2's specific denormalization and unpatchifying process.
Configuration and Compatibility Improvements: Updated model configuration loading to gracefully handle unknown keys from HuggingFace configs and enabled inference with precomputed text embeddings, enhancing flexibility and compatibility.
Rotary Embedding and LayerNorm Refinements: Refined the apply_rotary_emb function to support flexible sequence dimensions and corrected RMSNorm's forward_native method for accurate type casting with weights.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

compare_flux2_dit.py
- Added a script to compare FastVideo's Flux2 DiT output with the official Diffusers step-0 output.
compare_flux2_dit_blocks.py
- Added a script for block-by-block comparison of FastVideo's Flux2 DiT against the official Diffusers implementation, including detailed intermediate activation comparisons.
compare_flux2_dit_sglang.py
- Added a script to compare FastVideo's Flux2 DiT output with SGLang's transformer output.
compare_flux2_e2e_ssim.py
- Added an end-to-end image comparison script using SSIM/PSNR for FastVideo, Diffusers, and SGLang Flux2 Klein.
compare_flux2_text_encoder_sglang.py
- Added a script to compare FastVideo's Flux2 Klein text encoder outputs against Diffusers or SGLang.
compare_flux2_text_encoder_three_way.py
- Added a script for a three-way comparison of FastVideo, SGLang, and Diffusers Flux2 Klein text encoder outputs.
compare_flux2_vae_sglang.py
- Added a script to compare FastVideo's Flux2 VAE decode against Diffusers or SGLang.
debug_text_encoder_sglang.py
- Added a script to debug text encoder divergences between FastVideo and SGLang layer-by-layer.
dump_flux2_step0.py
- Added a utility script to dump step-0 inputs and official transformer output from Flux2KleinPipeline.
dump_sglang_flux2_step0.py
- Added a utility script to dump step-0 inputs and SGLang transformer output for Flux2 Klein.
fastvideo/configs/models/base.py
- Updated update_model_arch to skip unknown keys instead of raising an error, improving compatibility with HuggingFace configs.
fastvideo/configs/models/dits/init.py
- Added Flux2Config to the __all__ export list.
fastvideo/configs/models/dits/flux_2.py
- Added new configuration classes Flux2ArchConfig and Flux2Config for Flux2 DiT models.
fastvideo/configs/models/encoders/init.py
- Added Qwen3TextConfig to the __all__ export list.
fastvideo/configs/models/encoders/qwen3.py
- Added new configuration classes Qwen3TextArchConfig and Qwen3TextConfig for the Qwen3 text encoder.
fastvideo/configs/models/vaes/init.py
- Added Flux2VAEConfig to the __all__ export list.
fastvideo/configs/models/vaes/flux2vae.py
- Added new configuration classes Flux2VAEArchConfig and Flux2VAEConfig for Flux2 VAE models.
fastvideo/configs/pipelines/flux_2.py
- Added new pipeline configuration classes Flux2PipelineConfig and Flux2KleinPipelineConfig for Flux2 models, including specific text encoder post-processing.
fastvideo/configs/pipelines/registry.py
- Updated PIPE_NAME_TO_CONFIG and PIPELINE_DETECTOR to include Flux2 and Flux2 Klein pipelines.
fastvideo/configs/sample/flux_2.py
- Added new sampling parameter classes Flux2SamplingParam and Flux2KleinSamplingParam.
fastvideo/configs/sample/registry.py
- Updated SAMPLING_PARAM_REGISTRY and SAMPLING_FALLBACK_PARAM to include Flux2 and Flux2 Klein sampling parameters.
fastvideo/entrypoints/video_generator.py
- Modified _generate_single_video to allow precomputed prompt_embeds to skip text encoding.
fastvideo/layers/layernorm.py
- Corrected RMSNorm forward_native to apply to(orig_dtype) after scaling with weight, if present.
fastvideo/layers/rotary_embedding.py
- Modified apply_rotary_emb to accept a sequence_dim argument for flexible broadcasting of freqs_cis.
fastvideo/models/dits/flux_2.py
- Added the Flux2Transformer2DModel implementation, including Flux2SwiGLU, Flux2FeedForward, Flux2Attention, Flux2ParallelSelfAttention, Flux2SingleTransformerBlock, Flux2TransformerBlock, Flux2TimestepGuidanceEmbeddings, Flux2Modulation, and Flux2PosEmbed.
fastvideo/models/encoders/qwen3.py
- Added the Qwen3ForCausalLM text encoder implementation, including Qwen3MLP, Qwen3Attention, and Qwen3DecoderLayer.
fastvideo/models/loader/component_loader.py
- Added _collect_safetensors_keys and modified TransformerLoader to infer num_layers and num_single_layers from checkpoint keys for Flux2 models.
fastvideo/models/registry.py
- Updated _IMAGE_TO_VIDEO_DIT_MODELS, _TEXT_ENCODER_MODELS, and _VAE_MODELS to register Flux2 and Qwen3 components.
fastvideo/models/vaes/flux2vae.py
- Added the AutoencoderKLFlux2 VAE model implementation, inheriting from ParallelTiledVAE.
fastvideo/pipelines/basic/flux_2/init.py
- Added Flux2Pipeline and Flux2KleinPipeline to the __all__ export list.
fastvideo/pipelines/basic/flux_2/flux_2_klein_pipeline.py
- Added the Flux2KleinPipeline class, inheriting from Flux2Pipeline.
fastvideo/pipelines/basic/flux_2/flux_2_latent_preparation.py
- Added Flux2LatentPreparationStage for Flux2-specific packed latent handling.
fastvideo/pipelines/basic/flux_2/flux_2_pipeline.py
- Added the Flux2Pipeline class, defining its stages for Flux2 image generation.
fastvideo/pipelines/basic/flux_2/flux_2_timestep_preparation.py
- Added Flux2TimestepPreparationStage for Flux2-specific timestep preparation, including compute_empirical_mu.
fastvideo/pipelines/composed_pipeline_base.py
- Modified load_modules to pop is_distilled from model_index as it's HF metadata.
fastvideo/pipelines/stages/decoding.py
- Modified _denormalize_latents to handle scaling_factor and shift_factor more robustly, added _unpatchify_latents and _flux2_bn_denorm_and_unpatchify for Flux2's packed latent format, and integrated debug_nan_check.
fastvideo/pipelines/stages/denoising.py
- Integrated debug_nan_check for debugging during the denoising process.
fastvideo/pipelines/stages/image_encoding.py
- Modified forward to return early if pil_image is None, and updated verify_output to allow image_latent to be None.
fastvideo/pipelines/stages/text_encoding.py
- Modified forward to skip text encoding if precomputed prompt_embeds are provided.
fastvideo/pipelines/stages/utils.py
- Added debug_nan_check utility function.
fastvideo/registry.py
- Updated _register_configs to include Flux2 and Flux2 Klein pipeline and sampling parameter configurations.
run_flux2_official.py
- Added a script to run the official FLUX.2 Klein 4B model via Diffusers for comparison.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the Flux2 model, including its Klein variant. The changes are comprehensive, covering model implementations (DiT, VAE, text encoder), pipeline configurations, and new pipeline stages to handle Flux2's specific architecture, such as packed latents and dynamic shifting for the scheduler. A suite of utility scripts for debugging and comparing implementations against diffusers and SGLang is also included, which is very helpful for validation.

The core implementation changes appear solid. The model loading is made more robust, and mixed-precision handling in RMSNorm is improved. The new pipeline stages for Flux2 are well-integrated.

I have one comment on a duplicated function in one of the new comparison scripts. Otherwise, the port looks good.

gemini-code-assist · 2026-02-26T00:52:09Z

compare_flux2_e2e_ssim.py

+def get_diffusers_prompt_embeds(
+    prompt: str,
+    model_id: str,
+    device: str = "cuda",
+    dtype: torch.dtype = torch.bfloat16,
+) -> torch.Tensor:
+    """Get Flux2 Klein prompt_embeds from diffusers (layers 9, 18, 27). Returns tensor on device."""
+    try:
+        from diffusers import Flux2KleinPipeline
+    except ImportError:
+        from diffusers.pipelines.flux2 import Flux2KleinPipeline
+    pipe = Flux2KleinPipeline.from_pretrained(model_id, torch_dtype=dtype)
+    pipe = pipe.to(device)
+    prompt_embeds, _ = pipe.encode_prompt(
+        prompt=prompt,
+        device=device,
+        num_images_per_prompt=1,
+        max_sequence_length=512,
+        text_encoder_out_layers=(9, 18, 27),
+    )
+    return prompt_embeds


The function get_diffusers_prompt_embeds is defined here and then again at line 256. This redefinition is likely unintentional and makes the code confusing. Although neither of these functions appears to be called in the current script, this duplication should be resolved to improve code clarity and prevent potential bugs if this code is used in the future. I suggest removing this first definition, and potentially the second one as well if it's confirmed to be dead code.

Gnav3852 added 30 commits February 25, 2026 13:58

flux_2

c5c5f7e

flux2-klien

28f2c75

param fix

c135016

more fixes

cf1a8d6

qwen3

683854b

cuda change

02cb4ae

atten

c902650

imageen

e251689

verify

def1363

mu

a80becb

isntance

844d0af

reshape

0ee9329

reshape

904eb18

decode

851e48a

NaN

56a0b39

NaN

726378c

NaN checks

bbfa354

nan fix

5bf5d6d

test

bbf733f

flux2

c7c7703

base

dcd74b9

image fix

75b51a6

compare_dit

3fdee04

almost there

fc989cb

training

b65332b

core init

5e8089d

port

6af28bd

dtype

2ed8a24

eee

9791ded

dit blocks

f32785d

Gnav3852 added 25 commits February 25, 2026 13:59

oproj

b573ba1

fixes

7f563fd

fullT

e6f74f4

sanity chck

1a5f099

cpu offlaod

9a68a33

bf16

f06b4e8

3way

040b20c

test

7eb10a8

registry

0e859da

cahn ge back

b262c58

ssim

45ed49a

sdpa

316b576

e

ca5d38b

allow text encode from preomcouted

6138354

run sep

ce73c1b

eh

ff0b48e

block compare

4d7732c

changes

6f2ff4b

e

5de038d

compare blocks

45f6f13

ff

8fc53b2

e

8154ffa

e

68fe651

registry changes

d5c5430

flux2 file change

828f27b

Gnav3852 changed the title ~~Flux2 port~~ Flux2 port [in progress] Feb 26, 2026

SolitaryThinker requested a review from Eigensystem February 26, 2026 00:52

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

jzhang38 marked this pull request as draft February 26, 2026 01:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flux2 port [in progress]#1133

Flux2 port [in progress]#1133
Gnav3852 wants to merge 108 commits intohao-ai-lab:mainfrom
Gnav3852:gnav/flux2-port

Gnav3852 commented Feb 26, 2026

Uh oh!

gemini-code-assist bot commented Feb 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gnav3852 commented Feb 26, 2026

Uh oh!

gemini-code-assist bot commented Feb 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant