MagCache on Wan 2.2 Dual-Transformer Pipelines: Incorrect Step Accounting and Limited Effectiveness on a 4-Step Distilled Model (e.g. Wan2.2)

</h2>While evaluating MagCache on Wan 2.2, I found what appears to be a design issue in the current implementation for dual-transformer pipelines.The current implementation assumes that a transformer executes exactly <code inline="">num_inference_steps</code> times during inference. This assumption is valid for single-transformer architectures such as Flux, SDXL, and Wan 2.1, but it does not hold for Wan 2.2, where denoising is split between two transformers that each execute only during part of the diffusion process.After implementing a prototype fix that assigns independent MagCache configurations to each transformer based on the number of timesteps it actually executes, I still observe poor practical results on a distilled 4-step Wan 2.2 model:<ul><li>At thresholds below 0.3, MagCache provides little or no measurable speedup.</li><li>At thresholds greater than or equal to 0.3, runtime improves (~25%), but output quality degrades severely.</li><li>The same behavior occurs whether MagCache is applied only to the first transformer or to both transformers.</li></ul>I am therefore opening this issue both as a bug report regarding the current dual-transformer implementation and as a request for feedback regarding the expected behavior of MagCache on heavily step-distilled Wan 2.2 models.<hr><h2>Background</h2>According to the current MagCache documentation, calibration produces a set of <code inline="">mag_ratios</code> that describe the model's intrinsic magnitude decay curve.Example workflow:<pre><code class="language-python">calib_config = MagCacheConfig(
 calibrate=True,
 num_inference_steps=4,
)
</code></pre>After calibration, the reported ratios are passed back into the inference configuration:<pre><code class="language-python">mag_config = MagCacheConfig(
 mag_ratios=[...],
 num_inference_steps=4,
)
</code></pre>The documentation also states that ratios calibrated at higher step counts can be reused at lower step counts through interpolation.For example, ratios obtained from a 50-step calibration can be interpolated and reused for a 20-step or lower-step inference schedule.<hr><h2>Experimental Setup</h2>The tested model is not the original Wan 2.2 inference configuration.The workflow is:<ol><li>Start from the original Wan 2.2 14B model (designed for approximately 50 inference steps).</li><li>Apply a step-distillation LoRA to convert the model into a 4-step model.</li><li>Evaluate MagCache on the resulting distilled model.</li></ol>Therefore all observations below correspond to:<pre><code class="language-text">Wan 2.2 14B (50-step)
 ↓
Step Distillation LoRA
 ↓
4-step distilled model
 ↓
MagCache evaluation
</code></pre>I also tested using interpolated MagCache ratios derived from the original 50-step calibration, following the workflow described in the documentation.The resulting outputs were significantly degraded on the distilled 4-step model.<hr><h2>Potential Bug: Dual-Transformer Step Accounting</h2>Wan 2.2 divides denoising between two transformers:<ul><li><code inline="">transformer</code> (high-noise stage)</li><li><code inline="">transformer_2</code> (low-noise stage)</li></ul>The split is determined by <code inline="">boundary_ratio</code>.For example, with a 4-step schedule:
Stage | Executions
-- | --
High-noise transformer | 2
Low-noise transformer | 2

Neither transformer executes 4 times.However, MagCache currently receives:<pre><code class="language-python">num_inference_steps = 4
</code></pre>for both transformers.Internally, calibration completion depends on:<pre><code class="language-python">if state.step_index &gt;= self.config.num_inference_steps:
</code></pre>This implicitly assumes:<pre><code class="language-text">transformer executions == num_inference_steps
</code></pre>which is not true for Wan 2.2.As a result, calibration and state management appear to be tied to the total diffusion step count rather than the number of executions of the individual transformer.<hr><h2>Prototype Fix</h2>To address this, I implemented a prototype that treats each transformer as an independent MagCache scope.The scheduler timesteps are first split according to the pipeline's <code inline="">boundary_ratio</code>.Example:<pre><code class="language-text">4 diffusion steps

High-noise transformer: 2 steps
Low-noise transformer: 2 steps
</code></pre>Each transformer then receives:<ul><li>its own <code inline="">MagCacheConfig</code></li><li>its own <code inline="">num_inference_steps</code></li><li>its own <code inline="">mag_ratios</code></li><li>its own MagCache state</li></ul>instead of sharing a single global diffusion-step count.Conceptually:<pre><code class="language-text">transformer
 ├── own state
 ├── own ratios
 └── own step count

transformer_2
 ├── own state
 ├── own ratios
 └── own step count
</code></pre>This guarantees that:<pre><code class="language-python">if state.step_index &gt;= self.config.num_inference_steps:
</code></pre>is evaluated against the number of executions of that specific transformer rather than the total diffusion schedule.<hr><h2>Results</h2>After implementing the per-transformer step accounting described above, I evaluated MagCache on the distilled 4-step Wan 2.2 model.<h3>Threshold &lt; 0.3</h3>No meaningful speedup was observed.Runtime remained very close to the baseline.<h3>Threshold &gt;= 0.3</h3>Runtime improved from approximately:<pre><code class="language-text">40 s → 30 s
</code></pre>which corresponds to roughly a 25% speedup.However, output quality degraded significantly and became visibly worse than the baseline.<h3>Transformer Coverage</h3>
I tested both:
<ul>
<li>applying MagCache only to <code inline="">transformer</code></li>
<li>applying MagCache to both <code inline="">transformer</code> and <code inline="">transformer_2</code></li>
</ul>

The behavior was similar in terms of speed/quality tradeoff, but there was one noticeable difference:

<ul>
<li>Applying MagCache only to the first transformer (<code inline="">transformer</code>) produced better output quality compared to applying MagCache to both transformers, but provided no meaningful speedup and had the same inference time as the base model with LoRA.</li>
<li>However, the quality was still significantly worse than the baseline without MagCache.</li>
</ul>

This suggests that the second transformer may be more sensitive to MagCache reuse, or that the error introduced by caching accumulates differently across the two denoising stages.

baseline wan22 14b 4step distilled:
https://github.com/user-attachments/assets/01424605-874a-4635-b07e-4817111227c7

baseline wan22 14b 4step distilled + apply magcache on both transformers:
https://github.com/user-attachments/assets/22e52d25-a524-496d-9756-18df0942adcd

baseline wan22 14b 4step distilled + apply magcache on first transformer:
https://github.com/user-attachments/assets/4e99b509-b17f-4c1e-8463-90b273fa3420

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MagCache on Wan 2.2 Dual-Transformer Pipelines: Incorrect Step Accounting and Limited Effectiveness on a 4-Step Distilled Model (e.g. Wan2.2) #14025

Background

Experimental Setup

Potential Bug: Dual-Transformer Step Accounting

Prototype Fix

Results

Threshold < 0.3

Threshold >= 0.3

Transformer Coverage

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

MagCache on Wan 2.2 Dual-Transformer Pipelines: Incorrect Step Accounting and Limited Effectiveness on a 4-Step Distilled Model (e.g. Wan2.2) #14025

Description

Background

Experimental Setup

Potential Bug: Dual-Transformer Step Accounting

Prototype Fix

Results

Threshold < 0.3

Threshold >= 0.3

Transformer Coverage

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions