Skip to content

Conversation

@Aya-ZIbra
Copy link
Contributor

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2017

Add stand-alone blackwell decode op.
Supported mask:
BlockDiagonalCausalWithOffsetPaddedKeysMask

Differential Revision: D84630701

Aya Ibrahim added 3 commits October 9, 2025 15:16
Summary:
This diff updates the code to enable BF16 enablement with latest Cutlass version.

The changes include updating the code in the `blackwell_gen_impl.cu` and `collective/sm100_fmha_gen_mainloop_warpspecialized.hpp` files to support BF16 data type.

The `fmha.hpp` file also includes a check to ensure that the SMEM usage does not exceed the capacity.

Differential Revision: D84624233
@netlify
Copy link

netlify bot commented Oct 15, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 7e3b0db
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68f11fc12c214300084d9fd1
😎 Deploy Preview https://deploy-preview-5004--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Oct 15, 2025

@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84630701.

Aya-ZIbra added a commit to Aya-ZIbra/FBGEMM that referenced this pull request Oct 16, 2025
Summary:
Pull Request resolved: pytorch#5004

X-link: https://github.com/facebookresearch/FBGEMM/pull/2017

Add stand-alone blackwell decode op.
Supported mask:
   BlockDiagonalCausalWithOffsetPaddedKeysMask

Differential Revision: D84630701
Summary:
Pull Request resolved: pytorch#5004

X-link: https://github.com/facebookresearch/FBGEMM/pull/2017

Add stand-alone blackwell decode op.
Supported mask:
   BlockDiagonalCausalWithOffsetPaddedKeysMask

Differential Revision: D84630701
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant