- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.1k
(Naga) Cooperative Matrix Support #8251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
881da16    to
    430d104      
    Compare
  
    | Haven't actually looked in the PR yet, but you should take a look at the presentation about cooperative matrices from the F2F: https://docs.google.com/presentation/d/1wiy3-ar58ah1W9Qc5trd0gG7fwCo93IJ9YCtQoR6W6c/edit?slide=id.g30fc39156ff_0_0#slide=id.g30fc39156ff_0_0 and the dawn design doc https://dawn.googlesource.com/dawn/+/refs/heads/main/docs/dawn/features/subgroup_matrix.md just to make sure things are synced up with upstream. | 
| @cwfitzgerald this is very useful, thanks for linking! Funny to see the timing of that presentation roughly matching when I started working on it, independently. I looked at the slides as well as the design doc, and here is my first feedback. Apologies if it's not thought through enough! 
 There are very similar types - textures and sampler - which also are very abstract from the shader writer point of view. Was it considered to just use the "Handle" storage class? 
 There is a choice for each of them: scope, role (left/right/acc), type, etc, to be either a generic argument or a part of the name. In this PR, for example, the role is encoded as a generic A/B/C. I think that makes sense because it allows to express operations like matrix store cleanly as generic instead of overloaded for all kinds of the matrix. Similarly, the "subgroup" part. If we had it as a generic scope, it could also use it in other parts of the language/API (e.g. barriers). 
 A boolean argument is generally a bad API pattern, since the call site has no clue about what it means from just looking at the invocation. Since this is supposed to be a constant anyway, maybe this is a good application for including this into the function name itself? This PR is currently exposing it as  Overall, looks reasonable. Curious if Apple had concerns about some parts as well. | 
| @cwfitzgerald @jimblandy do you have a strong preference on how to proceed with the changes? I'm at the point where things basically work, and the test is validating correctly. We could: 
 I'm fine either way. I just want to use this for a project and will be on a branch if I'm not able to merge. My preference would be (1). | 
2bf7828    to
    782a0fc      
    Compare
  
    | Ok, I've got  | 
| I think it's our standard practice to land experimental things, so I think it's okay for us to review and land this as-is. However, the WebGPU committee will almost certainly approve some version of Alan's proposal, eventually, so if we put something different in wgpu, it will just need to be changed. So, I'd like to really encourage you to adapt what you've got to Alan's proposal as much as feasible, but we shouldn't block merging on 100% compliance. | 
| Yeah landing this as is is fine. Once the proposal lands against WebGPU @kvark could you raise concerns against the gpuweb repo, we're not really involved in the proposal here. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good! I have some concerns, mainly around testing.
- Wiring this up with wgpu shouldn't be too difficult as it's a pure shader-side change and wiring that up would let us write a few runtime tests, which would go a long way to make me feel confident that this feature is actually working (and ensures it stays working).
- Some testing of error handling in the wgsl frontend would be great, particularly in the wgsl_errors.rstest file. There's some interesting new surface to test here, so getting some coverage there would be great.
- Finally, this is strictly non-blocking but would be great, if you can, wiring up spirv-in would be nice as it would give access to this feature to people not using wgsl. From what I can see of the spirv backend it shouldn't be too difficult, though that's a guess.
| pub(crate) const COOPERATIVE_LOAD_FUNCTION: &str = "NagaCooperativeLoad"; | ||
| pub(crate) const COOPERATIVE_MULTIPLY_ADD_FUNCTION: &str = "NagaCooperativeMultiplyAdd"; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need to be added to the list of predefined constants passed to the namer. I believe it's a big list in a standalone file in each backend.
| //Note: technically can do `FMul` but IR doesn't have matrix per-component multiplication | ||
| | (Dimension::CooperativeMatrix, _) | ||
| | (_, Dimension::CooperativeMatrix) => { | ||
| unimplemented!() | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to validate this out in the validator?
| return Err(Error::FeatureNotImplemented( | ||
| "Copperative load/store out-of-bounds handling", | ||
| )); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we file an issue about this? Will this fail if we try to use this with bounds checking enabled? If so this would mean this feature is useless with a normal shader module.
| let memory_model = if self | ||
| .capabilities_used | ||
| .contains(&spirv::Capability::VulkanMemoryModel) | ||
| { | ||
| spirv::MemoryModel::Vulkan | ||
| } else { | ||
| spirv::MemoryModel::GLSL450 | ||
| }; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this cause problems for any of the other behavior that naga relies on - i.e. is the vulkan memory model backwards compatible?
| self.as_global().ensure_type_exists(None, inner) | ||
| } | ||
|  | ||
| fn _get_runtime_expression(&self, expr: Handle<ir::Expression>) -> &ir::Expression { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's up with this function?
| Ok((ty, span)) | ||
| } | ||
|  | ||
| /// Parses `<T,R>`, returning (T, span of T, R, span of R) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to actually return the span of the cooperative role.
| god_mode = true | ||
|  | ||
| [spv] | ||
| debug = true | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we skip debug output here, just to keep snapshots less verbose
| @@ -0,0 +1,9 @@ | |||
| targets = "IR | SPIRV | METAL | WGSL" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep the IR output long term, again just thinking about snapshot size.
| metal::simdgroup_float8x8 NagaCooperativeLoad(const device float* ptr, int stride, bool is_row_major) { | ||
| metal::simdgroup_float8x8 m; | ||
| simdgroup_load(m, ptr, stride, 0, is_row_major); | ||
| return m; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a helper function to do this? Could you add the exploitation to the MSL helper function code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rather well crafted example to test all kinds of features, could you annotate it with comments to show what exactly you're testing and where?
Connections
Blocked by gfx-rs/rspirv#265
Since rspirv fails validation of the product, even though it's correct.
Description
Adding shader support for KHR_cooperative_matrix. Considering a rather simple scope that is portable between Vulkan and Metal.
Testing
Adds tests.
Squash or Rebase?
Rebase.
Checklist
cargo fmt.taplo format.cargo clippy --tests. If applicable, add:--target wasm32-unknown-unknowncargo xtask testto run tests.CHANGELOG.mdentry.API choices
SPIRV and Metal have a fine intersection of the cooperative matrix functionality, with some caveats:
coop_mat.OpCooperativeMatrixLoadKHRandOpCooperativeMatrixMulAddKHRas expressions andOpCooperativeMatrixStoreKHRas a statement. Metal has all of them 3 as statements. I followed SPIR-V notion here, as does Google's proposal.Things left for follow-up: