[phyai] DiT cache support

## Summary

Add configurable cache support for phyai's DiT / diffusion transformer inference path. The goal is to reuse selected intermediate results across denoising steps, reduce repeated computation, and improve end-to-end generation latency.

DiT inference repeatedly runs transformer blocks, attention, and MLP layers across adjacent timesteps. Many hidden states or residual updates are locally redundant, especially across nearby denoising steps. Existing acceleration methods usually exploit this through block-level cache, residual cache, first-block cache, timestep-level skipping, or Taylor-style approximation, trading a small and controllable quality change for lower latency.

This issue should focus on inference-time, training-free cache support. Cache should be disabled by default and must not change existing generation behavior unless explicitly enabled. Once enabled, users should be able to control the cache strategy, target range, active steps, thresholds, and quality/speed trade-off through explicit configuration.

## Motivation

- DiT-based models are expensive to run, especially for image and video generation where denoising requires many steps and deep transformer stacks.
- Users may accept a small quality shift in exchange for lower latency or higher throughput.
- Ecosystem projects such as diffusers, Cache-DiT, and SGLang already expose related acceleration mechanisms that can inform phyai's implementation and API design.
- Adding DiT cache support gives phyai a unified entry point for future inference optimizations across DiT-based pipelines such as FLUX, Wan, HunyuanVideo, Qwen-Image, and similar models.

## Goals

- Provide a unified DiT cache configuration entry point, for example:
  - `enable_dit_cache(...)`
  - `disable_dit_cache()`
  - or a `dit_cache` field in pipeline / model config.
- Support at least one basic cache strategy for the MVP:
  - first-block / residual cache;
  - or block-level cache;
  - or an adapter around existing Cache-DiT / diffusers cache APIs.
- Keep cache disabled by default. Enabling cache should not change the original pipeline's prompt arguments, seed handling, scheduler behavior, dtype, or device placement semantics.
- Support a per-run cache lifecycle to prevent cache pollution across prompts, batches, shapes, or devices.
- Expose useful debug and metric information:
  - skipped block / step count;
  - latency;
  - peak memory;
  - quality regression comparison.

## References

- Hugging Face diffusers cache API: https://huggingface.co/docs/diffusers/api/cache
- Cache-DiT: https://github.com/vipshop/cache-dit
- SGLang DiT caching acceleration docs: https://docs.sglang.io/docs/sglang-diffusion/caching-acceleration


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[phyai] DiT cache support #15

Summary

Motivation

Goals

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[phyai] DiT cache support #15

Description

Summary

Motivation

Goals

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions