[phyai] SageAttention Backend

## Summary

Add support for [SageAttention](https://github.com/thu-ml/SageAttention) as a selectable backend in PhyAI's Diffusion Attention system.

## Motivation

SageAttention provides optimized quantized attention kernels for diffusion models, with reported speedups over FlashAttention while preserving output quality. Adding SageAttention would give PhyAI users another high-performance attention backend for diffusion/DiT workloads, especially video and image generation models where attention cost is significant.

## Proposed Scope

- Add a new diffusion attention backend option, e.g. `SAGE_ATTN` or `sage`.
- Lazily import `sageattention` so it remains an optional dependency.
- Provide a clear error message when the backend is selected but SageAttention is not installed or the GPU/CUDA environment is unsupported.
- Support the common diffusion attention path, especially non-causal self-attention/cross-attention.
- Preserve existing fallback behavior for unsupported shapes, dtypes, masks, or devices.
- Add unit tests and, if available, a small benchmark/validation path comparing against the existing SDPA/FlashAttention backend.

## References

- SageAttention repository: https://github.com/thu-ml/SageAttention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[phyai] SageAttention Backend #13

Summary

Motivation

Proposed Scope

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[phyai] SageAttention Backend #13

Description

Summary

Motivation

Proposed Scope

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions