Skip to content

MagCache on Wan 2.2 Dual-Transformer Pipelines: Incorrect Step Accounting and Limited Effectiveness on a 4-Step Distilled Model (e.g. Wan2.2) #14025

Description

@fatemehgh1313

While evaluating MagCache on Wan 2.2, I found what appears to be a design issue in the current implementation for dual-transformer pipelines.

The current implementation assumes that a transformer executes exactly num_inference_steps times during inference. This assumption is valid for single-transformer architectures such as Flux, SDXL, and Wan 2.1, but it does not hold for Wan 2.2, where denoising is split between two transformers that each execute only during part of the diffusion process.

After implementing a prototype fix that assigns independent MagCache configurations to each transformer based on the number of timesteps it actually executes, I still observe poor practical results on a distilled 4-step Wan 2.2 model:

  • At thresholds below 0.3, MagCache provides little or no measurable speedup.

  • At thresholds greater than or equal to 0.3, runtime improves (~25%), but output quality degrades severely.

  • The same behavior occurs whether MagCache is applied only to the first transformer or to both transformers.

I am therefore opening this issue both as a bug report regarding the current dual-transformer implementation and as a request for feedback regarding the expected behavior of MagCache on heavily step-distilled Wan 2.2 models.


Background

According to the current MagCache documentation, calibration produces a set of mag_ratios that describe the model's intrinsic magnitude decay curve.

Example workflow:

calib_config = MagCacheConfig(
    calibrate=True,
    num_inference_steps=4,
)

After calibration, the reported ratios are passed back into the inference configuration:

mag_config = MagCacheConfig(
    mag_ratios=[...],
    num_inference_steps=4,
)

The documentation also states that ratios calibrated at higher step counts can be reused at lower step counts through interpolation.

For example, ratios obtained from a 50-step calibration can be interpolated and reused for a 20-step or lower-step inference schedule.


Experimental Setup

The tested model is not the original Wan 2.2 inference configuration.

The workflow is:

  1. Start from the original Wan 2.2 14B model (designed for approximately 50 inference steps).

  2. Apply a step-distillation LoRA to convert the model into a 4-step model.

  3. Evaluate MagCache on the resulting distilled model.

Therefore all observations below correspond to:

Wan 2.2 14B (50-step)
    ↓
Step Distillation LoRA
    ↓
4-step distilled model
    ↓
MagCache evaluation

I also tested using interpolated MagCache ratios derived from the original 50-step calibration, following the workflow described in the documentation.

The resulting outputs were significantly degraded on the distilled 4-step model.


Potential Bug: Dual-Transformer Step Accounting

Wan 2.2 divides denoising between two transformers:

  • transformer (high-noise stage)

  • transformer_2 (low-noise stage)

The split is determined by boundary_ratio.

For example, with a 4-step schedule:

Stage | Executions -- | -- High-noise transformer | 2 Low-noise transformer | 2

Neither transformer executes 4 times.

However, MagCache currently receives:

num_inference_steps = 4

for both transformers.

Internally, calibration completion depends on:

if state.step_index >= self.config.num_inference_steps:

This implicitly assumes:

transformer executions == num_inference_steps

which is not true for Wan 2.2.

As a result, calibration and state management appear to be tied to the total diffusion step count rather than the number of executions of the individual transformer.


Prototype Fix

To address this, I implemented a prototype that treats each transformer as an independent MagCache scope.

The scheduler timesteps are first split according to the pipeline's boundary_ratio.

Example:

4 diffusion steps

High-noise transformer: 2 steps
Low-noise transformer: 2 steps

Each transformer then receives:

  • its own MagCacheConfig

  • its own num_inference_steps

  • its own mag_ratios

  • its own MagCache state

instead of sharing a single global diffusion-step count.

Conceptually:

transformer
├── own state
├── own ratios
└── own step count

transformer_2
├── own state
├── own ratios
└── own step count

This guarantees that:

if state.step_index >= self.config.num_inference_steps:

is evaluated against the number of executions of that specific transformer rather than the total diffusion schedule.


Results

After implementing the per-transformer step accounting described above, I evaluated MagCache on the distilled 4-step Wan 2.2 model.

Threshold < 0.3

No meaningful speedup was observed.

Runtime remained very close to the baseline.

Threshold >= 0.3

Runtime improved from approximately:

40 s → 30 s

which corresponds to roughly a 25% speedup.

However, output quality degraded significantly and became visibly worse than the baseline.

Transformer Coverage

I tested both:

  • applying MagCache only to transformer

  • applying MagCache to both transformer and transformer_2

The behavior was similar in terms of speed/quality tradeoff, but there was one noticeable difference:

  • Applying MagCache only to the first transformer (transformer) produced better output quality compared to applying MagCache to both transformers, but provided no meaningful speedup and had the same inference time as the base model with LoRA.

  • However, the quality was still significantly worse than the baseline without MagCache.

This suggests that the second transformer may be more sensitive to MagCache reuse, or that the error introduced by caching accumulates differently across the two denoising stages.

baseline wan22 14b 4step distilled:
https://github.com/user-attachments/assets/01424605-874a-4635-b07e-4817111227c7

baseline wan22 14b 4step distilled + apply magcache on both transformers:
https://github.com/user-attachments/assets/22e52d25-a524-496d-9756-18df0942adcd

baseline wan22 14b 4step distilled + apply magcache on first transformer:
https://github.com/user-attachments/assets/4e99b509-b17f-4c1e-8463-90b273fa3420

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions