Skip to content

Flux2 port [in progress]#1133

Draft
Gnav3852 wants to merge 108 commits intohao-ai-lab:mainfrom
Gnav3852:gnav/flux2-port
Draft

Flux2 port [in progress]#1133
Gnav3852 wants to merge 108 commits intohao-ai-lab:mainfrom
Gnav3852:gnav/flux2-port

Conversation

@Gnav3852
Copy link

In progress flux2 port with debug files

To run debug files:
Install diffusers from source
Have sglang at the same level as Fastvid in the pod

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Gnav3852, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the FastVideo library's capabilities by adding full support for the Flux2 image generation model, including the distilled Klein variant. It integrates the Flux2 DiT transformer, VAE, and Qwen3 text encoder, along with specialized pipeline stages tailored for Flux2's unique data formats and processing requirements. A key aspect of this PR is the inclusion of extensive comparison and debugging utilities, allowing for rigorous validation against official implementations and facilitating detailed analysis of model behavior. These additions ensure high fidelity and robust performance for Flux2 models within the FastVideo ecosystem.

Highlights

  • Flux2 Model Integration: Introduced comprehensive support for the Flux2 image generation model, including its Klein variant, by integrating its DiT transformer, VAE, and Qwen3 text encoder into the FastVideo framework.
  • Comparison and Debugging Tools: Added a suite of Python scripts for detailed comparison and debugging of Flux2 components (DiT, VAE, Text Encoder) against official Diffusers and SGLang implementations, including block-by-block and end-to-end image comparisons.
  • Pipeline Stage Enhancements: Implemented Flux2-specific latent and timestep preparation stages to correctly handle its unique packed latent format and resolution-dependent timestep scaling. Also improved VAE decoding logic for Flux2's specific denormalization and unpatchifying process.
  • Configuration and Compatibility Improvements: Updated model configuration loading to gracefully handle unknown keys from HuggingFace configs and enabled inference with precomputed text embeddings, enhancing flexibility and compatibility.
  • Rotary Embedding and LayerNorm Refinements: Refined the apply_rotary_emb function to support flexible sequence dimensions and corrected RMSNorm's forward_native method for accurate type casting with weights.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • compare_flux2_dit.py
    • Added a script to compare FastVideo's Flux2 DiT output with the official Diffusers step-0 output.
  • compare_flux2_dit_blocks.py
    • Added a script for block-by-block comparison of FastVideo's Flux2 DiT against the official Diffusers implementation, including detailed intermediate activation comparisons.
  • compare_flux2_dit_sglang.py
    • Added a script to compare FastVideo's Flux2 DiT output with SGLang's transformer output.
  • compare_flux2_e2e_ssim.py
    • Added an end-to-end image comparison script using SSIM/PSNR for FastVideo, Diffusers, and SGLang Flux2 Klein.
  • compare_flux2_text_encoder_sglang.py
    • Added a script to compare FastVideo's Flux2 Klein text encoder outputs against Diffusers or SGLang.
  • compare_flux2_text_encoder_three_way.py
    • Added a script for a three-way comparison of FastVideo, SGLang, and Diffusers Flux2 Klein text encoder outputs.
  • compare_flux2_vae_sglang.py
    • Added a script to compare FastVideo's Flux2 VAE decode against Diffusers or SGLang.
  • debug_text_encoder_sglang.py
    • Added a script to debug text encoder divergences between FastVideo and SGLang layer-by-layer.
  • dump_flux2_step0.py
    • Added a utility script to dump step-0 inputs and official transformer output from Flux2KleinPipeline.
  • dump_sglang_flux2_step0.py
    • Added a utility script to dump step-0 inputs and SGLang transformer output for Flux2 Klein.
  • fastvideo/configs/models/base.py
    • Updated update_model_arch to skip unknown keys instead of raising an error, improving compatibility with HuggingFace configs.
  • fastvideo/configs/models/dits/init.py
    • Added Flux2Config to the __all__ export list.
  • fastvideo/configs/models/dits/flux_2.py
    • Added new configuration classes Flux2ArchConfig and Flux2Config for Flux2 DiT models.
  • fastvideo/configs/models/encoders/init.py
    • Added Qwen3TextConfig to the __all__ export list.
  • fastvideo/configs/models/encoders/qwen3.py
    • Added new configuration classes Qwen3TextArchConfig and Qwen3TextConfig for the Qwen3 text encoder.
  • fastvideo/configs/models/vaes/init.py
    • Added Flux2VAEConfig to the __all__ export list.
  • fastvideo/configs/models/vaes/flux2vae.py
    • Added new configuration classes Flux2VAEArchConfig and Flux2VAEConfig for Flux2 VAE models.
  • fastvideo/configs/pipelines/flux_2.py
    • Added new pipeline configuration classes Flux2PipelineConfig and Flux2KleinPipelineConfig for Flux2 models, including specific text encoder post-processing.
  • fastvideo/configs/pipelines/registry.py
    • Updated PIPE_NAME_TO_CONFIG and PIPELINE_DETECTOR to include Flux2 and Flux2 Klein pipelines.
  • fastvideo/configs/sample/flux_2.py
    • Added new sampling parameter classes Flux2SamplingParam and Flux2KleinSamplingParam.
  • fastvideo/configs/sample/registry.py
    • Updated SAMPLING_PARAM_REGISTRY and SAMPLING_FALLBACK_PARAM to include Flux2 and Flux2 Klein sampling parameters.
  • fastvideo/entrypoints/video_generator.py
    • Modified _generate_single_video to allow precomputed prompt_embeds to skip text encoding.
  • fastvideo/layers/layernorm.py
    • Corrected RMSNorm forward_native to apply to(orig_dtype) after scaling with weight, if present.
  • fastvideo/layers/rotary_embedding.py
    • Modified apply_rotary_emb to accept a sequence_dim argument for flexible broadcasting of freqs_cis.
  • fastvideo/models/dits/flux_2.py
    • Added the Flux2Transformer2DModel implementation, including Flux2SwiGLU, Flux2FeedForward, Flux2Attention, Flux2ParallelSelfAttention, Flux2SingleTransformerBlock, Flux2TransformerBlock, Flux2TimestepGuidanceEmbeddings, Flux2Modulation, and Flux2PosEmbed.
  • fastvideo/models/encoders/qwen3.py
    • Added the Qwen3ForCausalLM text encoder implementation, including Qwen3MLP, Qwen3Attention, and Qwen3DecoderLayer.
  • fastvideo/models/loader/component_loader.py
    • Added _collect_safetensors_keys and modified TransformerLoader to infer num_layers and num_single_layers from checkpoint keys for Flux2 models.
  • fastvideo/models/registry.py
    • Updated _IMAGE_TO_VIDEO_DIT_MODELS, _TEXT_ENCODER_MODELS, and _VAE_MODELS to register Flux2 and Qwen3 components.
  • fastvideo/models/vaes/flux2vae.py
    • Added the AutoencoderKLFlux2 VAE model implementation, inheriting from ParallelTiledVAE.
  • fastvideo/pipelines/basic/flux_2/init.py
    • Added Flux2Pipeline and Flux2KleinPipeline to the __all__ export list.
  • fastvideo/pipelines/basic/flux_2/flux_2_klein_pipeline.py
    • Added the Flux2KleinPipeline class, inheriting from Flux2Pipeline.
  • fastvideo/pipelines/basic/flux_2/flux_2_latent_preparation.py
    • Added Flux2LatentPreparationStage for Flux2-specific packed latent handling.
  • fastvideo/pipelines/basic/flux_2/flux_2_pipeline.py
    • Added the Flux2Pipeline class, defining its stages for Flux2 image generation.
  • fastvideo/pipelines/basic/flux_2/flux_2_timestep_preparation.py
    • Added Flux2TimestepPreparationStage for Flux2-specific timestep preparation, including compute_empirical_mu.
  • fastvideo/pipelines/composed_pipeline_base.py
    • Modified load_modules to pop is_distilled from model_index as it's HF metadata.
  • fastvideo/pipelines/stages/decoding.py
    • Modified _denormalize_latents to handle scaling_factor and shift_factor more robustly, added _unpatchify_latents and _flux2_bn_denorm_and_unpatchify for Flux2's packed latent format, and integrated debug_nan_check.
  • fastvideo/pipelines/stages/denoising.py
    • Integrated debug_nan_check for debugging during the denoising process.
  • fastvideo/pipelines/stages/image_encoding.py
    • Modified forward to return early if pil_image is None, and updated verify_output to allow image_latent to be None.
  • fastvideo/pipelines/stages/text_encoding.py
    • Modified forward to skip text encoding if precomputed prompt_embeds are provided.
  • fastvideo/pipelines/stages/utils.py
    • Added debug_nan_check utility function.
  • fastvideo/registry.py
    • Updated _register_configs to include Flux2 and Flux2 Klein pipeline and sampling parameter configurations.
  • run_flux2_official.py
    • Added a script to run the official FLUX.2 Klein 4B model via Diffusers for comparison.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Gnav3852 Gnav3852 changed the title Flux2 port Flux2 port [in progress] Feb 26, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the Flux2 model, including its Klein variant. The changes are comprehensive, covering model implementations (DiT, VAE, text encoder), pipeline configurations, and new pipeline stages to handle Flux2's specific architecture, such as packed latents and dynamic shifting for the scheduler. A suite of utility scripts for debugging and comparing implementations against diffusers and SGLang is also included, which is very helpful for validation.

The core implementation changes appear solid. The model loading is made more robust, and mixed-precision handling in RMSNorm is improved. The new pipeline stages for Flux2 are well-integrated.

I have one comment on a duplicated function in one of the new comparison scripts. Otherwise, the port looks good.

Comment on lines +98 to +118
def get_diffusers_prompt_embeds(
prompt: str,
model_id: str,
device: str = "cuda",
dtype: torch.dtype = torch.bfloat16,
) -> torch.Tensor:
"""Get Flux2 Klein prompt_embeds from diffusers (layers 9, 18, 27). Returns tensor on device."""
try:
from diffusers import Flux2KleinPipeline
except ImportError:
from diffusers.pipelines.flux2 import Flux2KleinPipeline
pipe = Flux2KleinPipeline.from_pretrained(model_id, torch_dtype=dtype)
pipe = pipe.to(device)
prompt_embeds, _ = pipe.encode_prompt(
prompt=prompt,
device=device,
num_images_per_prompt=1,
max_sequence_length=512,
text_encoder_out_layers=(9, 18, 27),
)
return prompt_embeds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function get_diffusers_prompt_embeds is defined here and then again at line 256. This redefinition is likely unintentional and makes the code confusing. Although neither of these functions appears to be called in the current script, this duplication should be resolved to improve code clarity and prevent potential bugs if this code is used in the future. I suggest removing this first definition, and potentially the second one as well if it's confirmed to be dead code.

@jzhang38 jzhang38 marked this pull request as draft February 26, 2026 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant