Skip to content

[phyai] SageAttention Backend #13

@chenghuaWang

Description

@chenghuaWang

Summary

Add support for SageAttention as a selectable backend in PhyAI's Diffusion Attention system.

Motivation

SageAttention provides optimized quantized attention kernels for diffusion models, with reported speedups over FlashAttention while preserving output quality. Adding SageAttention would give PhyAI users another high-performance attention backend for diffusion/DiT workloads, especially video and image generation models where attention cost is significant.

Proposed Scope

  • Add a new diffusion attention backend option, e.g. SAGE_ATTN or sage.
  • Lazily import sageattention so it remains an optional dependency.
  • Provide a clear error message when the backend is selected but SageAttention is not installed or the GPU/CUDA environment is unsupported.
  • Support the common diffusion attention path, especially non-causal self-attention/cross-attention.
  • Preserve existing fallback behavior for unsupported shapes, dtypes, masks, or devices.
  • Add unit tests and, if available, a small benchmark/validation path comparing against the existing SDPA/FlashAttention backend.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed
    No fields configured for Feature.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions