Add LingBot-World support by radicalyyyahaha · Pull Request #1083 · hao-ai-lab/FastVideo

radicalyyyahaha · 2026-02-09T01:20:25Z

No description provided.

gemini-code-assist · 2026-02-09T01:20:46Z

Summary of Changes

Hello @radicalyyyahaha, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the FastVideo framework's capabilities by integrating full support for LingBot-World models. It introduces a new image-to-video pipeline tailored for these models, refines the model configuration to accommodate their specific architectures, and makes the weight loading process more adaptable to various checkpoint formats. Additionally, it includes a dedicated conversion script to simplify the preparation of LingBot-World checkpoints and implements a performance enhancement for multi-process operations.

Highlights

LingBot-World Model Support: Introduced comprehensive support for LingBot-World models, including specific parameter name mappings for official checkpoints and a dedicated image-to-video pipeline.
Flexible Weight Loading: Enhanced the weight loading mechanism to be more robust, allowing for fallback to different file types (.pth, .bin) and intelligent extraction of tensor state dictionaries from various checkpoint wrappers, including stripping common prefixes for VAE weights.
New WanCamImageToVideoPipeline: Added a new WanCamImageToVideoPipeline designed for LingBot/Wan camera-control image-to-video checkpoints, which utilizes two transformer experts and a FlowUniPCMultistepScheduler.
Dynamic Model Configuration: Updated WanVideoArchConfig to include new optional parameters (dim, num_heads, in_dim, out_dim, model_type) and logic to dynamically adjust model dimensions and attention heads based on these inputs.
Performance Optimization: Optimized multi-process execution by avoiding unnecessary CPU transfers for CUDA IPC when sending result tensors, improving overall performance.
Checkpoint Conversion Script: Provided a new utility script to convert LingBot-World checkpoints into a FastVideo-compatible repository layout, streamlining the integration process for these models.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

fastvideo/configs/models/dits/wanvideo.py
- Added extensive parameter name mappings for official Wan/LingBot checkpoint naming conventions.
- Introduced new optional configuration parameters (dim, num_heads, in_dim, out_dim, model_type) to WanVideoArchConfig.
- Modified the __post_init__ method to dynamically set num_attention_heads, in_channels, out_channels, attention_head_dim, and hidden_size based on the newly added optional parameters.
fastvideo/models/loader/component_loader.py
- Imported the new extract_tensor_state_dict utility function.
- Expanded the allowed patterns for weight loading to include .pth files.
- Added _class_name and architectures to the list of keys that are popped from the model_config during loading.
- Refactored VAE weight loading to prioritize .safetensors and then fallback to .pt, .pth, or .bin files, including logic to strip common prefixes from VAE weights.
- Implemented a warning mechanism to filter and log unsupported WanModel configuration fields.
- Included WanModel in the list of models that permit non-strict weight loading.
fastvideo/models/loader/weight_utils.py
- Modified pt_weights_iterator to handle torch.load gracefully, attempting weights_only=True first and falling back to weights_only=False for legacy checkpoints, followed by tensor state dictionary extraction.
- Added a new function extract_tensor_state_dict to recursively find and extract a plain tensor state dictionary from various common checkpoint wrappers.
fastvideo/models/registry.py
- Added 'WanModel' as an alias for 'WanTransformer3DModel' in the _IMAGE_TO_VIDEO_DIT_MODELS registry.
fastvideo/pipelines/basic/wan/wan_cam_i2v_pipeline.py
- Added a new file defining WanCamImageToVideoPipeline, a camera-control image-to-video pipeline for LingBot/Wan models.
- The pipeline is configured to use FlowUniPCMultistepScheduler and orchestrates various stages including input validation, prompt encoding, conditioning, timestep preparation, latent preparation, image VAE encoding, denoising (with two transformers), and decoding.
fastvideo/registry.py
- Registered new configurations for Wan2_2_I2V_A14B_SamplingParam and Wan2_2_I2V_A14B_Config to support robbyant/lingbot-world-base-cam and pipelines detected by 'lingbot-world' or 'wancamimagetovideopipeline'.
fastvideo/worker/multiproc_executor.py
- Modified worker_busy_loop to send output_batch.output directly via pipe, avoiding an unnecessary .cpu() transfer when using CUDA IPC for result tensors, which improves performance.
scripts/checkpoint_conversion/lingbot_world_to_fastvideo.py
- Added a new script to convert LingBot-World checkpoints into a FastVideo-compatible repository layout.
- This script handles copying and normalizing transformer configurations, managing tokenizer, text encoder, and VAE components, and generating a model_index.json file for proper integration.

Activity

The pull request was initiated by radicalyyyahaha with the goal of adding LingBot-World support.
The changes involve modifications to model configurations, weight loading utilities, and the introduction of a new pipeline and conversion script.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for LingBot-World models. The changes include new parameter mappings, more flexible model configuration and weight loading, a new pipeline, and a conversion script. The changes are generally good, improving compatibility and performance. However, I found a potential issue in the __post_init__ method of WanVideoArchConfig where hidden_size is being incorrectly overwritten, which could lead to model configuration errors. I've provided a suggestion to fix this.

gemini-code-assist · 2026-02-09T01:22:31Z

fastvideo/configs/models/dits/wanvideo.py

    def __post_init__(self):
+        if self.num_heads is not None:
+            self.num_attention_heads = self.num_heads
+        if self.in_dim is not None:
+            self.in_channels = self.in_dim
+        if self.out_dim is not None:
+            self.out_channels = self.out_dim
+        if self.dim is not None and self.num_attention_heads > 0:
+            self.attention_head_dim = self.dim // self.num_attention_heads
+            self.hidden_size = self.dim
        super().__post_init__()
        self.out_channels = self.out_channels or self.in_channels
        self.hidden_size = self.num_attention_heads * self.attention_head_dim


In the __post_init__ method, self.hidden_size is set on line 175 based on self.dim, but it's then unconditionally overwritten on line 178. This can lead to an incorrect hidden_size if self.dim is not a multiple of self.num_attention_heads due to integer division. For example, if dim was 5001 and num_attention_heads was 40, attention_head_dim would be 125, and hidden_size would be incorrectly set to 40 * 125 = 5000 instead of the intended 5001. The logic should be restructured to correctly prioritize self.dim for hidden_size when it's available.

Suggested change

def __post_init__(self):

if self.num_heads is not None:

self.num_attention_heads = self.num_heads

if self.in_dim is not None:

self.in_channels = self.in_dim

if self.out_dim is not None:

self.out_channels = self.out_dim

if self.dim is not None and self.num_attention_heads > 0:

self.attention_head_dim = self.dim // self.num_attention_heads

self.hidden_size = self.dim

super().__post_init__()

self.out_channels = self.out_channels or self.in_channels

self.hidden_size = self.num_attention_heads * self.attention_head_dim

def __post_init__(self):

if self.num_heads is not None:

self.num_attention_heads = self.num_heads

if self.in_dim is not None:

self.in_channels = self.in_dim

if self.out_dim is not None:

self.out_channels = self.out_dim

super().__post_init__()

if self.dim is not None and self.num_attention_heads > 0:

self.attention_head_dim = self.dim // self.num_attention_heads

self.hidden_size = self.dim

else:

self.hidden_size = self.num_attention_heads * self.attention_head_dim

self.out_channels = self.out_channels or self.in_channels

radicalyyyahaha added 3 commits February 8, 2026 14:09

update on lingbot

717fb86

more adapt

5329d96

update

7cf0980

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LingBot-World support#1083

Add LingBot-World support#1083
radicalyyyahaha wants to merge 3 commits intohao-ai-lab:mainfrom
radicalyyyahaha:lingbot-world-adapt

radicalyyyahaha commented Feb 9, 2026

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

radicalyyyahaha commented Feb 9, 2026

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant