Skip to content

Commit ff28ca5

Browse files
gchalumpmeta-codesync[bot]
authored andcommitted
Buid time optimize (part2) (pytorch#5000)
Summary: X-link: https://github.com/facebookresearch/FBGEMM/pull/2014 Pull Request resolved: pytorch#5000 Continuous attempt to improve build time. Part1: D83523333 This diff breaks the files further down such that it is one instantiation per file ### Update Structure - **Template Headers**: - `blackwell_fmha_bwd_template.cuh`: Template definition only - `blackwell_fmha_fwd_template.cuh`: Template definition only - **Instantiation Files** (ONE instantiation per file): - 74 files following naming convention: `blackwell_fmha_{fwd|bwd}_hdim{64|128}_{fp16|bf16|fp8}_{varlen|novarlen}_{mask}_{det}_sm100.cu` - Examples: - `blackwell_fmha_fwd_hdim128_fp16_novarlen_nomask_sm100.cu` - `blackwell_fmha_bwd_hdim128_bf16_novarlen_nodet_causal_sm100.cu` - `blackwell_fmha_fwd_hdim64_fp8_varlen_residual_sm100.cu` Differential Revision: D84100982
1 parent 2674b39 commit ff28ca5

File tree

82 files changed

+3492
-1203
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+3492
-1203
lines changed

fbgemm_gpu/experimental/gen_ai/src/attention/cuda/cutlass_blackwell_fmha/blackwell_fmha_bwd.cu

Lines changed: 881 additions & 0 deletions
Large diffs are not rendered by default.

fbgemm_gpu/experimental/gen_ai/src/attention/cuda/cutlass_blackwell_fmha/blackwell_fmha_bwd_bf16_inst.cu

Lines changed: 0 additions & 223 deletions
This file was deleted.

0 commit comments

Comments
 (0)