Conversation
Summary of ChangesHello @Gnav3852, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the FastVideo library's capabilities by adding full support for the Flux2 image generation model, including the distilled Klein variant. It integrates the Flux2 DiT transformer, VAE, and Qwen3 text encoder, along with specialized pipeline stages tailored for Flux2's unique data formats and processing requirements. A key aspect of this PR is the inclusion of extensive comparison and debugging utilities, allowing for rigorous validation against official implementations and facilitating detailed analysis of model behavior. These additions ensure high fidelity and robust performance for Flux2 models within the FastVideo ecosystem. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for the Flux2 model, including its Klein variant. The changes are comprehensive, covering model implementations (DiT, VAE, text encoder), pipeline configurations, and new pipeline stages to handle Flux2's specific architecture, such as packed latents and dynamic shifting for the scheduler. A suite of utility scripts for debugging and comparing implementations against diffusers and SGLang is also included, which is very helpful for validation.
The core implementation changes appear solid. The model loading is made more robust, and mixed-precision handling in RMSNorm is improved. The new pipeline stages for Flux2 are well-integrated.
I have one comment on a duplicated function in one of the new comparison scripts. Otherwise, the port looks good.
| def get_diffusers_prompt_embeds( | ||
| prompt: str, | ||
| model_id: str, | ||
| device: str = "cuda", | ||
| dtype: torch.dtype = torch.bfloat16, | ||
| ) -> torch.Tensor: | ||
| """Get Flux2 Klein prompt_embeds from diffusers (layers 9, 18, 27). Returns tensor on device.""" | ||
| try: | ||
| from diffusers import Flux2KleinPipeline | ||
| except ImportError: | ||
| from diffusers.pipelines.flux2 import Flux2KleinPipeline | ||
| pipe = Flux2KleinPipeline.from_pretrained(model_id, torch_dtype=dtype) | ||
| pipe = pipe.to(device) | ||
| prompt_embeds, _ = pipe.encode_prompt( | ||
| prompt=prompt, | ||
| device=device, | ||
| num_images_per_prompt=1, | ||
| max_sequence_length=512, | ||
| text_encoder_out_layers=(9, 18, 27), | ||
| ) | ||
| return prompt_embeds |
There was a problem hiding this comment.
The function get_diffusers_prompt_embeds is defined here and then again at line 256. This redefinition is likely unintentional and makes the code confusing. Although neither of these functions appears to be called in the current script, this duplication should be resolved to improve code clarity and prevent potential bugs if this code is used in the future. I suggest removing this first definition, and potentially the second one as well if it's confirmed to be dead code.
In progress flux2 port with debug files
To run debug files:
Install diffusers from source
Have sglang at the same level as Fastvid in the pod